f 



DOCUMENT RESUME 



ED 445 403 



EA 030 609 



AUTHOR 

TITLE 

INSTITUTION 
SPONS AGENCY 

PUB DATE 
NOTE 

CONTRACT 
PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Yap, Kim; Aldersebaes, Inge; Railsback, Jennifer; 
Shaughnessy, Joan; Speth, Timothy 

Evaluating Whole-School Reform Efforts: A Guide for District 
and School Staff. Second Edition. 

Northwest Regional Educational Lab., Portland, OR. 

Office of Educational Research and Improvement (ED) , 
Washington, DC. 

2000-08-00 

173p. 

S283A50041-99C 

Guides - Non-Classroom (055) 

MF01/PC07 Plus Postage. 

♦Educational Administration; Educational Change; Elementary 
Secondary Education; *Formative Evaluation; *Program 
Evaluation; School Organization; School Restructuring 
Comprehensive School Reform Demonstration Program 



ABSTRACT 



This guidebook provides evaluation assistance to district 
and school staff. It was published in response to the Comprehensive School 
Reform Demonstration (CSRD) Program, passed by Congress in 1997 to provide 
incentives and support for low-performing, high-poverty schools . CSRD is an 
attempt to ensure that schools conduct evaluation of whole -school reform 
efforts in a way that provides valid and useful information for 
accountability and program improvement. The guide does not examine the 
philosophical underpinnings of evaluation issues. Rather, it provides 
guideposts that district and school staff can consider in choosing an 
approach to evaluating their school-reform efforts. It is intended for use by 
school staff at sites that have stated goals for student achievement and have 
already decided on one or more comprehensive strategies for reaching their 
goals. The text is arranged in a "train the trainer" format and is organized 
so as to assist in the design of a prof essional -development workshop. The 
book focuses on implementation evaluation and impact evaluation. Various 
design samples are also included to help schools customize their evaluation 
efforts. Each chapter includes handouts, small -group activities, transparency 
masters, and step-by-step instructions for creating an effective evaluation, 

A list of print and online resources appears in the back. (RJM) 







Reproductions supplied by EDRS are the best that can be made 
from the original document. 




Second Edition 
August 2000 



vO 

o 

s 

" o 




r 






U.S. DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
/ CENTER (ERIC) 

Gf This document has been reproduced as 
received from the person or organization 
originating it. 



□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy.- 



Northwest 



Reg I o n al 

^ 



Educational 



Laboratory 




^ ? 




2000 NWREL, Portland, Oregon 



Permission to reproduce in whole or in part is granted with the stipulation that the Northwest Regional 
Educational Laboratory be acknowledged as a source on all copies. 

The contents of this publication were developed under Grant No. S283A50041-99C from the U.S. Department 
of Education. However, the contents do not necessarily represent the policy of the Department of Education, 
and endorsement of the contents by the federal government should not be implied. 



ERIC 



Morthwest Regional Educational Laboratot7 



a 



Evaluating Whole-School 

Reform Efforts 



A Guide for District and School Staff 



Second Edition 
August 2000 



Kim Yap 

Inge Aldersebaes 
Jennifer Railsback 
Joan Shaughnessy 
Timothy Speth 

Comprehensive Center, Region X 



4 




Northwest Regional Educational Laboratory 
101 SW Main, Suite 500 Portland, Oregon 97204 



Table of Contents 




Acknowledgments » 

Intr oduction 

Overview 3 

Transparencies 11 

Implementation Evaluation 

Transparencies 35 

Handouts ^6 

Impact Evaluation 49 

Transparencies 87 

Handouts 101 

Design Sample ]09 

Transparencies 115 

Handouts 118 

Resources 125 

Print 125 

Online Publications and Resources 128 

Technical Assistance Providers 129 

Figures . 

Figure 1. Levels of Use Related to Instructional Implementation 23 

Figure 2. Evaluation Model 50 

Figure 3. Pretest- Posttest Model . 51 

Figure 4. Comparison Group Model 54 

Figure 5. Regression Model ^ 57 

Figure 6. Control Group Model 60 

Figure 7. Percent of Students Meeting Mathematical Benchmarks 77 

Tables 

Table 1. Advantages and Disadvantages of Evaluation Models 62 

Table 2. Data Collection Matrix 67 

ERIC . ^ 




Acknowledgments 



T his guidebook is the result of 
a collaborative effort among 
many individuals who are inter- 
ested in the improvement of 
teaching and learning through 
comprehensive school reform. 

At the U.S. Department of Ed- 
ucation, Patricia Gore, Sharon 
Saez, and Joyce Murphy pro- 
vided leadership and support 
that made the project possible. 

At the Northwest Regional 
Educational Laboratory, Rex 
Hagans and Steve Nelson re- 
viewed our original proposal 
for developing the guidebook 
and graciously provided stciff 
support to ensure the success- 
ful completion of the project. 
Jane Grimstad assisted with 
the design of the guidebook. 

A conference was held in Port- 
land on September 24, 1999, to 
review a preliminary dreift of the 
guidebook and to obtain feed- 
back for revision and refinement. 
We wish to thank all confer- 
ence participants, including 
the Comprehensive Center ad- 
visory committee members, who 
shared their ideas, observations, 
and insights that have enriched 
the content and enhanced the 
format of the guidebook. 



A special note of thanks is 
extended to Jana Potter who 
designed the process for ob- 
taining maximum input from 
participants at the conference. 
We are grateful to Georgia Pflei- 
derer and Audrey Trubshaw for 
their assistance with logistical 
arrangements for the conference. 

Finally, we wish to thank Lee 
Sherman and Catherine Paglin 
for their superb editorial help 
and Marjorie Wolfe and Sharmon 
Hillemeyer for their excellent 
assistance with word processing, 
design, and production of the 
guidebook. 



I n the fall of 1997, Congress 
set in motion a federal initia- 
tive to jump-start comprehen- 
sive reform in the nation's 
schools. The Comprehensive 
School Reform Demonstration 
(CSRD) Program provided incen- 
tives and support for schools, 
particularly high-poverty, low- 
performing schools, to develop 
and implement comprehensive 
school reform efforts. These 
schools are to carry out reform 
activities based on reliable re- 
search and effective practice. In 
the fall of 1998, Congress ex- 
tended CSRD funding for a sec- 
ond year. 

The current Title I legislation 
also provides an incentive for 
schools serving a high concen- 
tration of poor children to en- 
gage in whole-school reform. 
Title I schoolwide programs, 
implemented in schools with 
at least 50 percent of students 
in poverty, have the flexibility 
of pooling resources from other 
federal programs to plan and 
implement schoolwide improve- 
ment activities. 

The most recent U.S. Depart- 
ment of Education estimate in- 
dicates that there are 1,600 
schools participating in the 
Comprehensive School Reform 
Demonstration (CSRD) Program 
across the nation. In addition, 
approximately 15,000 Title I 
schools are implementing school- 
wide programs. Each of these 
whole-school reform efforts is 
to be evaluated to assess its im- 
pact on teaching and learning. 



The Education Department has 
issued general guidance to help 
district and school staff evaluate 
CSRD and Title I schoolwide 
programs. This guidebook, de- 
veloped collaboratively by the 
Comprehensive Center and the 
CSRD work unit at the Northwest 
Regional Educational Laboratory, 
is intended to provide further 
evaluation assistance to district 
and school staff. It is an attempt 
to help ensure that schools con- 
duct evaluation of whole-school 
reform efforts in a way that pro- 
vides valid and useful informa- 
tion for accountability and 
program improvement. 

The guidebook is not intend- 
ed to be a philosophical discus- 
sion of evaluation issues. Nor is 
it designed to be a cookbook on 
the evaluation of whole-school 
reform efforts. Users who have 
no prior training or experience 
with program evaluation will not 
become skilled evaluators by 
reading the document. Rather, 
it is our intention to provide 
some guideposts that district 
and school staff can consider 
in choosing an approach to 
evaluating their school reform 
efforts. We hope that this guide- 
book will help raise awareness 
of the complexity of program 
evaluation in general and the 
evaluation of whole-school re- 
form efforts in particular. 







7 



Overview 



The Purpose of This Guide 



T he intention of this guide- 
book is to increase under- 
standing about how to design 
and implement an evaluation 
plan that will help answer 
questions about program qual- 
ity and effectiveness in accom- 
plishing school improvement 
goals. Rather than turning to 
outside sources for evaluation 
expertise, schools can build 
their own knowledge and skills 
about how to evaluate whole- 
school reform efforts. As a re- 
sult, schools will gain confidence 
in their ability to demonstrate 
that their efforts are making a 
difference in student achieve- 
ment, as well as meet growing 
accountability requirements. 

This guide is to be used by 
school staff at sites that have 
already specified goals for stu- 
dent achievement (as required 
in most grant applications), 
and have also decided on one 
or more comprehensive strate- 
gies for reaching their goals. 
Once this preliminary planning 
work has been done, the school 
will be in a position to draw 
upon the information pre- 
sented in this guidebook to de- 
velop a useful evaluation plan. 



The guidebook has been 
planned to assist in the design 
of a professional development 
workshop. It is arranged in a 
"train the trainer" format. The 
hope is that those responsible 
for evaluation will use this guide 
to provide staff development for 
all individuals who are engaged 
in comprehensive school reform, 
with the purpose of increasing 
their knowledge and involvement 
in the evaluation process. 

A wealth of information and 
activities is organized into spe- 
cific sections that can be pre- 
sented together or separately, 
depending on the needs of the 
workshop audience. Workshop 
audiences can vary; possible 
participants include an entire 
school staff, a leadership team 
that is responsible for the im- 
plementation of a CSRD Pro- 
gram, or Title I schoolwide 
school principals within a dis- 
trict. Each section furnishes the 
presenter with an explanation 
of various aspects of evaluation 
design and process, instructions 
for carrying out the workshop, 
and corresponding activities 
and transparencies. 



Who Is Responsible 
for Evaluation? 

O ften, school staff ask, 

"Who should be the eval- 
uator?" The greatest benefits 
from evaluation are realized 
when the school takes owner- 
ship of evaluation and uses the 
findings to stimulate change 
that makes a difference in how 
they go about comprehensive 
reform. For this reason, the 
best answer to this question is 
an evaluation team composed 
of representatives from the 
whole-school community. We 
highly encourage schools to in- 
clude any group or person that 
has an investment in either the 
implementation or results of its 
school reform efforts. 

An internal evaluation team 
will increase the likelihood that 
the evaluation plan will be ad- 
ministered well. The evaluation 
team's responsibility begins with 
designing a relevant evaluation 
plan that addresses their infor- 
mation needs and grant require- 
ments. This requires generating 
enthusiasm and support in the 
school community for the evalu- 
ation plan. A significant role the 
evaluation team will be assigned 



er|c 



School Community-all individuals and groups who have an invested interest in the school, for example, students, 
parents, teachers, local employers, prindpals, or school board members 

Program Evaluation-the use of various methods to determine the degree to which a program has been developed 
and implemented as planned, as well as accomplished its stated goals and objectives 

Formative Evaluation-the monitoring of activities and strategies that take place during the development and 
implementation of a program and informs stakeholders about possible program adjustments to improve quality 
and effectiveness 

Summative Evaluation -evaluation of the ultimate results of a program, asking the question, "Has the program 
accomplished what it intended?" 



L 



Overview 



o 




is the administration of the 
evaluation plan, which involves 
identifying and developing in- 
struments, collecting necessary 
data, analyzing and interpreting 
data, and reporting results to 
all stakeholders. Dedicating 
sufficient time and resources 
is essential to the success of an 
evaluation plan. The evaluation 
team will need time to oversee 
the evaluation process, immerse 
themselves in the data, and en- 
sure that findings are considered 
throughout program implemen- 
tation and converted into con- 
structive changes that improve 
school improvement efforts. 

Another consideration for 
evaluation is when to use ex- 
ternal expertise, an outside 
, evaluator. Outside evaluators 
can provide the technical guid- 
ance during the design, analy- 
sis, and reporting phases of 
evaluation. The evaluator's as- 
sistance will help ensure that 
evaluation plans are relevant 
and realistic. The collection of 
data is typically left to the proj- 
ect staff because of the ease 
and day-to-day access they 
generally have to data. A col- 
laborative working relationship 
between the outside evaluator 
and internal evaluation team 
merges the best of two view- 
points. An outside evaluator 
brings an objective perspective 
to the process and can more 
easily ask the difficult, reflec- 
tive questions that can be 
missed/avoided by those who 
are implementing the compre- 
hensive reform program. Further, 
the project staff comprehend the 
data best and can attach mean- 
ing to the numbers generated by 
an evaluation. By collaborating 
with an outside evaluator, schools 
can overcome some of the typi- 



cal obstacles (fear, lack of ex- 
perience in evaluation, time 
limitations) they face when 
planning and implementing 
evaluation plans, thus increas- 
ing the feasibility of their eval- 
uation strategies. 

Evaluation 

Requirements 

S pecific evaluation require- 
ments for state or federal 
grants have been purposely 
left out of this guide. We have 
chosen not to address such re- 
quirement issues because often 
the requirements are explicit to 
a grant, differ from year to year, 
and vary from program to pro- 
gram as well as from state to 
state. For these reasons, it would 
be very difficult to accurately 
address requirement issues 
around evaluation. The best 
approach to ensure that your 
school's evaluation plan meets 
program/grant specific require- 
ments is to contact your state's 
educational agency. 

Context of Comprehensive 
School Reform 

C omprehensive school reform 
and Title I schoolvnde pro- 
grams are well underway across 
the nation. Along with being 
responsible for restructuring 
their operational systems, 
schools increasingly are being 
held accountable for the results 
of their whole-school reform ef- 
forts. Federal and state educa- 
tion officials are asking several 
significant questions: (1) Are 
comprehensive school reform ef- 
forts producing positive results 
in student achievement? (2) Are 
comprehensive school reform 



programs being implemented as 
planned and with fidelity to the 
adopted model? and (3) Will 
state and local policies and 
practices sustain comprehensive 
school reform? These questions 
should drive evaluation efforts. 
(Overview Transparency #1) 




The overarching goals of eval- 
uation are twofold: to inform 
schools about what is and isn't 
working, and to guide decisions 
about program adjustments and 
improvements, thereby increas- 
ing the likeUhood of positive 
impact. 

Program evaluation is a sys- 
tematic process designed to 
gauge the quality and effec- 
tiveness of a program. Evalua- 
tion produces information that 
helps monitor progress and solves 
problems to enhance program 
implementation and impact. 
Evaluation is most meaningful 
when it is integrated early into 
the program design. Tacking it 
at the end of a program seldom 
yields useful findings. (Over- 
view Transparency #2) 

There are two basic types of 
evaluation, each with its dis- 
tinct purpose. "Formative" 
evaluation produces informa- 
tion used to improve a program 
during its operation. It gener- 
ates information that guides 




decisionmaking about the pro- 
gram's desirability, feasibility, 
fidelity, and soundness in pro- 
ducing desired results (Nelson, 
1999; Sarvela & McDermott, 
1993). "Summative" evaluation, 
on the other hand, gamers data 
necessary for judging the ulti- 
mate success of the entire pro- 
gram (Sarvela & McDermott, 
1993). Its major purpose is to 
answer the question, "Did the 
program do what it promised?" 
(Overview Transparency #4) 

Often, evaluation focuses only 
on results. But without data on 
program implementation, it is 
difficult to link student out- 
comes to the program or to 
make timely adjustments to 
enhance program effectiveness. 
With ongoing and well-thought- 
out program evaluation, a school 
community can constmct a com- 
pelling case that its comprehen- 
sive reform efforts did indeed 
contribute to the improvement 
of its students' academic per- 
formance. 

A number of assumptions 
guide program evaluation 
(Northwest Regional Educa- 
tional Laboratory [NWREL] , 
2000). It should: 



■ Be comprehensive enough 
to reflect decisionmaking 
needs and provide timelines 
for ongoing, immediate feed- 
back for continuous program 
improvement 

■ Use a multimethod approach 
to enhance the validity of data 

■ Provide sound information 
regarding outcomes and ef- 
fectiveness in achieving ex- 
pected program outcomes 

■ Employ a combination of 
quantitative and qualitative 
strategies 

Program 

Implementation 

esearch has consistently 
shown that the depth and 
quality of program implernenta- 
tion is a powerful factor in the 
success of school reform pro- 
grams. Comprehensive reform 
efforts can succeed if they are 
implemented well. In particu- 
lar, schools should pay atten- 
tion to how widely staff 
members embrace the program 
and how well they understand 
it. Schools should ask, "Is the 
program being implemented as 
intended?" Research has identi- 
fied nine program components 
(see sidebar on page 6) that 
contribute to the quality of 



a comprehensive reform program 
and are influential in helping 
improve student achievement. 
Careful monitoring of these nine 
components provides insight into 
what factors help or hinder re- 
form efforts. These components 
can provide a useful framework 
for gathering, interpreting, and 
using data to make decisions 
about implementation progress 
and challenges. The specific 
evaluation questions that guide 
the process and determine which 
data collection strategies to use 
are (Sarvela & McDermott, 1993): 
(1) Which intervention activities 
are being used? (2) Is the inter- 
vention being implemented with 
fidelity? (3) What is working? 
(4) What should be improved? 
and (5) How should it be refined? 
Answers to these questions help 
determine how a school's reform 
program is making a difference. 
Linking achievements to compre- 
hensive school reform efforts is 
then possible. (Turn to Page 19 
for detailed discussion.) 

Program Outcome 
and Impact 

S ummative evaluation in- 
volves gathering the evi- 
dence necessary to determine 
overall program success in im- 
proving student achievement. 
The evaluation question driv- 
ing this portion of the investi- 
gation is, "Are we achieving 
what we aspired to do?" In the 
context of comprehensive 
school reform, program success 
is measured by how well the 
school stacks up against state 
standards and local assessment 
measures. 



Many reasons and benefits warrant conducting program evaluations, 
including (Overview Transparency #3): 

■ Strengthen program design by clearly articulating shared goals 
and objectives 

■ Fadlitating informed dedsionmaking about improving the quality 
of the program 

■ Contributing to making constructive changes to enhance program 
effectiveness 

■ Helping identify and celebrate successes when desired outcomes 
are achieved 

■ Reinfordng the link between schoolwide program strategies and 
student outcomes 




10 



Overview 



There are two basic forms of 
summative evaluation: outcome 
evaluation and impact evalua- 
tion. (Overview Transparency #6) 
Outcome evaluation examines 
immediate changes in knowl- 
edge, skill, attitude, and be- 
havior. Impact evaluation, on 
the other hand, demonstrates 
the program's long-term effects 
(Muraskin, 1993). Here's an ex- 
cunple: A school gives a parent 
workshop about the value of 
reading to children at home. 

The program outcome would 
be the new knowledge parents 
gained from their participation. 
This direct effect — increased 
parental knowledge — is an im- 
mediate result that may lead to 
increased reading with children 
at home. This in turn leads to 
a positive impact on academic 
achievement. In the world of 
evaluation, both new parental 
knowledge and more reading 
at home would be considered 
program outcomes. Improved 
reading achievement would 
be considered a long-term 
program impact. 

Routinely monitoring out- 
comes is beneficial because it 
provides frequent feedback to 
those involved in decisionmak- 
ing about the program. Knowl- 
edge gained from monitoring 
outcomes can gauge progress, 
uncover problems, help appro- 
priately allocate resources, and 
acknowledge successes (Pane, 
Mulligan, Ginsburg, & Lauland, 
1999). For example, if a pro- 
gram objective is to increase 
reading scores on the state 
assessment by 10 percent over 
the next three years, outcomes 
(such as improved reading skills) 
will help determine whether the 
school is moving in the desired 
direction. Program outcomes are 



results that are related to an 
objective but that occur more 
immediately. Knowing precisely 
what outcomes the school is 
looking for will help ascertain 
which data sources contain the 
desired information. This can 
help schools avoid the common 
error of collecting urmeeded data 
that can hike costs and waste 
time. (Turn to Page 49 for de- 
tailed discussion.) 



These components, when inte- 
grated into comprehensive school 
reform plans, enhance the quality 
and effectiveness of a program 
(Overview Transparency #5): 

■ Innovative strategies and 

proven methods that are based 
on reliable research and replicated 
successfully in schools with diverse 
characteristics 

■ A comprehensive design for 

effective school functioning 

■ Measurable goals for student 
performance and benchmarks for 
meeting those goals 

■ Commitment and support of 
school staff and community 

■ Meaningful involvement of 

parents and local community 

■ High-quality external technical 
support and assistance 

■ Evaluation plan for monitoring 
program implementation and 
assessing results in student 
achievement 

■ Coordinated resources to 

maximize and sustain the school 
reform effort 

■ High-quality and continuous 
teacher and staff professional 
development 



11 



Evaluation Design 
and Process 

H ow does a school design a 
comprehensive evaluation 
plan that meets federal and 
state requirements, and also 
satisfies its own informational 
needs? By addressing certain 
key questions early in program 
planning, the evaluation pro- 
cess will reflect the needs, in- 
terests, issues, and resources 
unique to the school (Sarvela 
& McDermott, 1993; Western 
Regional Center, 1995). Ques- 
tions that schools should ask 
of themselves are (Overview 
Transparency #7): 

■ What does our school want 
to accomplish overall? 

This requires clearly articulating 
goals and transforming them 
into specific, measurable objec- 
tives. Setting goals and objec- 
tives is difficult. Your school 
must first consider current con- 
ditions, needs, academic con- 
cerns, and resources. Creating 
a snapshot of your school can 
help you avoid the common 
pitfall of setting goals and 
objectives that are unrealistic 
given the available resources. 
The value of conducting a thor- 
ough needs assessment cannot 
be overemphasized. It will clar- 
ify issues, pinpoint priorities, 
and identify resources. 

■ What will our school have to 
do to achieve these goals and 
objectives? 

This is the stage when your 
school decides on specific strate- 
gies and activities to create the 
desired changes. This is when 
you determine how program 
goals and objectives are trans- 



latedinto research-based actions 
and strategies. Actions and stra- 
tegies should match goals and 
needs. Without that match, 
your school will have a tough 
time reaching its objectives. 

■ How will our school know 
that its program is succeeding 
at accomplishing its goals and 
objectives? 

Schools can gauge progress to- 
ward their goals by selecting 
program and student perfor- 
mance measures that are 
meaningful, measurable, and 
relevant — that is, related to 
program objectives. Perfor- 
mance indicators will provide 
the information needed to 
demonstrate program success. 
It's best to measure progress 
annually and at interim check- 
points (say, quarterly). With 
regular monitoring, yom school 
can uncover barriers to success 
and devise new strategies as 
you go along. 

■ How will evidence be gath- 
ered to demonstrate progress 
toward our school's goals? 

At this point schools need to 
decide which data collection 
methods they will use to ac- 
quire relevant information. 
(Turn to Page 26 for detailed 
discussion.) Typically, schools 
have a wealth of information at 
hand because they are continu- 
ally gathering data for various 
purposes. For this reason, 
schools can begin by building 
on existing systems, adding only 
data collection methods that 
will fill information gaps. Data 
collection methods are many. 
They include document review, 
surveys, interviews, focus 
groups, observation, and stu- 



fO 

U) 

{/) 

o 



Triangulation-confirming data 
credibility by using multiple 
data-gathering methods or 
multiple sources of data. 

Disaggregation of data- 

comparing of subgroups 
based on demographic 
characteristics and 
educational experiences 
that are deemed important. 



dent achievement assessments. 
Ideally, schools will choose to 
use multiple data gathering 
procedures to improve the 
credibility of their data. For 
example, changes in teaching 
practices can be assessed in 
several ways: administering a 
survey to students, observing 
classroom practices, or con- 
ducting a focus group with 
teachers. Using two or three 
data collection methods, mea- 
surement instruments, or data 
sources is a technique called 
"triangulation." Each data gath- 
ering method has advantages 
and disadvantages. (Turn to Page 
67 for Data Collection Matrix.) 

■ How will our school determine 

what the data are telling us? 

Making sense of the data col- 
lected becomes essential if the 
findings are to be used to in- 
fluence decisions and future 
planning about the school's 
comprehensive reform efforts. 
Interpretation of the data is 
best accomplished when it is 
reviewed by the school's staff 
and community, in particular 
those who are responsible for 
the day-to-day implementation 
of the program. Data analysis 
is an inquiry process meant to 
help schools examine and bet- 
ter understand the nature and 
effectiveness of their school 

« 



improvement program. The fol- 
lowing are reflective questions 
that can help guide discussions 
(Holcomb, 1999; Levesque, Brad- 
ley, Rossi, & Teitelbaum 1998): 

(1) What do these data reveal? 

(2) What else might explain 
these results? (3) What else do 
we need to know to better un- 
derstand the data before we 
draw conclusions? (4) What 
good news is here for us to 
celebrate? and (5) What needs 
to be done to improve program 
performance and effectiveness? 

■ How will our school use 

evaluation results? 

To maximize the benefits of 
evaluation, schools should es- 
tablish an ongoing process to 
review, interpret, and commu- 
nicate results. In this way, 
schools can keep the school 
community informed about the 
program's quality and effective 
ness. Sharing successes gener- 
ates enthusiasm, involvement, 
and commitment to the reform 
program. 

The same people who are im- 
plementing the program should 
collect and interpret the data. 
In this way, they will get im- 
mediate feedback to inform 
daily decisions about program 
operations and classroom prac- 
tices. Besides getting ongoing 
feedback, the school staff and 
community gain a sense of 
ownership by direct involve- 
ment. Ownership develops in- 
trinsic motivation to carry out 
the evaluation plans, interpret 
results, draw conclusions about 
program progress, and pursue 
improvements. Most of all, it fos 
ters trust that data will be used 
in a positive, not punitive, way. 



Overview 




Common Barriers to the Collection and Use of Evaluation Data 

■ Challenge of collaboration 

■ Lack of time 

■ Lack of proper training in practical program evaluation 

■ Fear of evaluation 



The school's evaluation plan 
arises from thoughtful consid- 
eration of these questions. 
Well-designed evaluations are 
invisible, becoming imbedded 
in daily routines. The most use- 
ful evaluation plans are those 
that are tailored to the unique 
needs and context of the reform 
program. The best plans glean 
relevant information about pro- 
gram performance and student 
achievement that will contribute 
to maximizing the program's 
effectiveness. 

To make sure their evaluation 
plan succeeds, schools must ad- 
dress the reasons people resist 
evaluation. Common barriers to 
the collection and use of evalu- 
ation data include: 

■ Challenge of collaboration. 
Staff, parents, and adminis- 
trators often lack not only 
sufficient time to work col- 
laboratively but also the skills 
and experience to work coop- 
eratively. 

■ Lack of time. The most com- 
mon obstacle is the shortage 
of time to successfully plan 
and implement evaluation. 
Many teachers already feel 
overwhelmed, and the thought 
of one more thing to do can 
be daunting. 

■ Lack of proper training in 
practical program evaluation. 
Few have the knowledge, 
skills, or confidence to con- 
duct program evaluations or 



the understanding of how to 
use data to guide decisions. 

■ Fear of evaluation. Many 
educators fear that data will 
be used against schools by 
exposing inadequacies and 
jeopardizing funding. This 
fear stems mainly from a mis- 
perception about the purpose 
and function of evaluation. 

Use of Data for 
Program Improvement 

E valuation is meaningless 
unless data are collected, 
reviewed, analyzed, and dis- 
seminated quickly and effi- 
ciently. Only when results are 
fed back into the system are 
they useful. The process of in- 
terpreting and reporting evalu- 
ation results is most mean- 
ingful when it is part of an 
ongoing, evolving process that 
engages all interested people. 
Schools must invest time to 
review and interpret results 
in order to realize the benefits 
of evaluation. 

Whenever possible, data should 
be disaggregated — that is, bro- 
ken down by categories such as 
gender, ethnicity, student type, 
and grade level. By disaggre- 
gating data, schools can zero 
in on areas of strength and 
weakness. Disaggregation of 
data also helps schools better 
understand the program's im- 
pact, in addition to addressing 
equity issues (Yap, 1997). 



Strengthening Programs 
Through Evaluation 

E valuation is a powerful tool 
that can reveal what is ac- 
tually occurring in schools. It 
can sift through the maze of 
school reform efforts to un- 
cover what is truly working to 
change the learning environ- 
ment. It can reveal the root 
causes of schools' struggles so 
that the real problem — not just 
the symptoms — can be tackled. 
It can also bring to light factors 
that contribute to positive re- 
sults so that schools can con- 
tinue to improve teaching and 
learning. 

No strand of a school — from 
curriculum and instruction to 
facilities operation, staff devel- 
opment, and administration — 
goes untouched in the school- 
wide reform process. The goal 
is to deliver a coherent, sound 
education that will bring high 
standards within reach for each 
and every child. Evaluation is 
the means of finding out where 
your school has been, where it's 
going, how it's getting there, 
and — most important — whether 
it's on target to reach its desired 
destination. If goals and prac- 
tices are out of sync, evalua- 
tion can point the way to get 
back on track. 

In the following sections of 
this guide, your school commu- 
nity will find the step-by-step 
guidance it needs to plan, de- 
sign, and carry out effective 
evaluation of your comprehen- 
sive school reform program. 



13 



References 



Ary, D., Jacobs, L.C., & Razavieh, A. (1996). Introduction to research 
in education (5th ed.). Fort Worth, TX: Harcourt Brace College. 

Holcomb, E.L. (1999). Getting excited about data: How to combine 
people, passion, and proof. Thousand Oaks, CA: Sage. 

Levesque, K., Bradby, D., Rossi, K., & Teitelbaum, P. (1998). At your 
fingertips: Using everyday data to improve schools. Berkeley, CA: 
MPR Associates, Arlington, VA: American Association of School 
Administrators, & Berkeley, CA: National Center for Research in 
Vocational Education. 

Muraskin, L. (1993). Understanding evaluation: The way to better 
prevention programs. Rockville, MD: Westat. 

Nelson, S. (1999, June). Principles of evaluation. Paper presented 
at Charter Schools Leadership Training Academy, Portland, OR. 

Northwest Regional Educational Laboratory. (2000). Developing 
your school's CSRD evaluation plan: An awareness workshop for 
local schools [Training materials]. Portland, OR: Author. 

Pane, N., Mulligan, I., Ginsburg, A., & Lauland, A. (1999). A guide 
to continuous improvement management (CIM): For 21st century 
community learning centers. Washington, DC: U.S. Department of 
Education. 

Sarvela, P.D., & McDermott, R.J. (1993). Health education evalua- 
tion and measurement: A practitioner's perspective. Madison, WI: 
Brown & Benchmark. 

Western Regional Center for Drug-Free Schools and Communities. 
(1995). Systemic evaluation: A new approach to assessing the ef- 
fects of tobacco, alcohol, and other drug (TAOD) programs. Port- 
land, OR: Northwest Regional Educational Laboratory. 

Yap, K.O. (1997). Guidebook on developing performance indicators. 
Portland, OR: Northwest Regional Educational Laboratory. 



Instructions for Overview 
Transparendes 

E ach transparency is related 
to the Overview section of 
the guidebook. Becoming famil- 
iar with the contents of this 
section will help guide your 
use of the transparencies. This 
section of the guidebook and 
corresponding transparencies 
provide a conceptual overview 
with brief description of criti- 
cal elements of program evalua- 
tion. More indepth discussions 
and examples of how to design 
and plan for program evalua- 
tion will be presented later in 
the guidebook. 

Transparency #1 

Sets the stage for understanding 
the significant overall questions 
driving comprehensive school 
reform evaluation. Briefly dis- 
cuss as described on Page 4 in 
the guidebook. 

Transparency #2 

Describes the overall purpose of 
program evaluation. Distinguish- 
ing between the two dimensions 
of formative (implementation) 
and summative (impact) evalu- 
ation is useful in helping un- 
derstand the unique purpose 
of each. Briefly discuss as de- 
scribed on Pages 4-5 in the 
guidebook. 

Transparency #3 

Outlines the benefits of evalua- 
tion with particular attention to 
its value in guiding decisions to 
improve the effectiveness of 
the comprehensive reform pro- 
gram. Briefly discuss as described 
on Pages 5-6 in the guidebook. 



er|c 



14 



Overview 



Transparenqr #4 

Provides a brief comparison of 
formative and summative eval- 
uation purpose and data collec- 
tion methods. 

Transparency #5 

Discusses nine components of ef- 
fective comprehensive school re- 
form. Briefly discuss as described 
on Page 6 in the guidebook. 

Transparency #6 

Introduces program outcome 
and impact evaluation. Briefly 
discuss as described on Pages 
5-6 in the guidebook. 

Transparency #7 

Introduces questions that facil- 
itate the planning of program 
evaluation. Briefly discuss as 
described on Pages 6-7 in the 
guidebook. 

Transparency #8 

Introduces the common barri- 
ers that often confront schools 
when planning and implement- 
ing evaluation plans. Briefly 
discuss as described on Page 
8 in the guidebook. 



he Significant Questions Driving Evaluation 



C 

CD 

T3 

3 

■4-> 

(/) 



(/) 

- 4 -> 

3 

tn 

CD 

i— 

CD 

*■3 

'tn 

O 

CL 

W) 

c 

'u 

3 

U 

o 



tn 

-4-> 

o 

CD 

E 

i— 

o 

<+- 

CD 

i— 

o 

o 

x: 

u 

tn 

CD 

> 

'En 

C 

CD 

SI 



E 

o 



T3 

C 

to 

T3 

CD 

C 

c 

JO 

CL 

to 

T3 

CD 

- 4 -> 

c 

CD 

E 

CL 

E 

W) 

c 

’a; 

XI 

tn 

E 

to 

Sb 

o 



o 

<+- 

CD 



n.. 

CD 
U 
O 

O ^ 
O T3 
^ CD 
U 
tn 



CL 
O 
U 

to 

a; XI 



CD 

> 



CD 

> 

C 

CD 

XI 

CD 

CL 

E 

o 

u 

c 

tin 

3 

tn 

CD 

XI 

tn 

CD 

U 

tj 

to 



T3 

C 

to 

tn 

CD 

U 

O 

CL 

75 

u 

o 

T3 

c 

to 

CD 



n.. 

•4-1 


XI 

CD 


O 

•4-1 


tc 

i/i 


n.. 

£ 


C 

CD 

E 


CL 

E 

o 


> 

o; 


1 


X- 

o 

<+— 

cu 

V 


CD 

> 


u 

CD 

< 


•g 

4- 

X 

•4-1 


o 

X 


o 

o 

X 


u 

to 


rsi 


i 


on 


u 

LO 



er|c 







CD 



— Northwest Regional Educational Laboratory — — Overview Transparency # I 



•a 

c 

ro 



T 



CD 



0 r- 
T3 CIS 

S Sb 

MS 

to CL 

^ >- 
CO 

*5 

"m •“ 
Cu ^ 

^ CIS 

01 ^ 

O O 
o 

LL (/) 
U <1^ 

•S c: 

ro 

E 

q; u 

■K 

(/) (U 
CIS *o 

E 

— CIS 

E 

o ±1 

§ 

^ Q- 

> 

q; £ 

E ^ 

CIS c: 

2f> E 

n •<— ' 

- a; 

■ *0 



a; 

•a 

•o > 

5 “ 

M);5 

2-g 

T3 QJ 
(D '*- 

^ C 
3 ^ 

>+-< 

C J= 
o 

CIS 

CIS 

Eto 

QJ 

o -52 

'p 3 

.= c to 

^ OJ 

■« 6JD ^ 

E o E 

CD ir .— 
^ CL i/J 
E QJ 
^ QJ ’T^ 

j- ^ W) 

o -g .E 

1 o = 
i ™ o 

> ^D. 

QJ .E _ 

QJ 



CIS 



lO 



? ^ c 

i .2 ■§ 

o -S2 C 

Ul U IS 

■ •SO 

■ ^ LO 



CIS 

to 

to 

QJ 

U 

QJ 

C 

CIS 

•4— < 

CIS 

*o 



c 

,2 

u 

QJ 



CO 

ob 

o 



o »- 

u 



QJ 

sz 



c 

QJ 



QJ 
to x: 
QJ 5= 

'tc 
7^ O 



to 

to 

QJ 

U 

u 



O 

> 

C 

0 

'jp to 
CO oj 

1 « 

> E 

QJ 

QJ 3 

> QJ 

ro £ 

E w) 
E .E 

3 biO 
<-0 TD 
m 13 



00 



er|c 



— Northwest Regional Educational Laboratory Overview Transparency #2 





•o 

c 

tc 

tc 

o 

OiO 

•o 

(D 

tc 

tn 

(3J0 

C 

3 

U 



tc 

tc 

id 

u 

>> 

X5 

c 

(3J0 

*Cn 

(D 

TD 

E 

tc 

(3J0 

O 



tn 

C ■ 

(D 

SZ (/) 

f.l 

CD u 

^ oi 

^ S' 

■ o 



CD 

H- 

O 

>> 

• MS 

S 

3 

CT 

(D 

x: 

- 4 -' 

OiO 

c 

■> 

o 

CL 

E 

- 4 -' 

o 

X5 

tc 

(3J0 

c ■ 
5 

tc 

E 

c 

o 

'En 

'u 

CD 

TD 

TD 

CD 

E 

o 

H- 

c 

tn 

CD 

tc c 

= s 

to 

u ^ 

is 5P 

u. O 



E 

to 

bb 

o 



<L> 

u 

c 

to 



<L> 

O 

-4— ' 

to 

<L> 

W) 

C 

to 



<L> 

> 



to 

C 

o 

u 

W) 

c 

S 

to 

£ 

o 

-4— ' 

to 

<L> 



to 

to 

CD 

C 

CD 

> 



O 

- ^ 

■ (D 



CD 

to 

to 

CD 



O 

U 

3 

O 

TD 

CD 

'io 

CD 

•o 

c 

CD 

sz 



to 

CD 

to 

to 

CD 

U 

U 

3 

to 

CD 

-4-» 

to 

Xi 

id 

CD 

u 

TD 

C 

to 



c 

CD 

2 T3 
to CD 
CL > 

CD .9d 

X ^ 
■ to 



c 

CD 

TD 

3 

to 

•o 

C 

to 

to 

CD 

'5b 

CD 

to 

un 

E 

to 

bJD 

O 



CD 

TD 



O 

O 

U 

to 

C 

CD 

CD 



CD 

Xi 

c 

CD 

sz 

to 

CD 

U 



to 
CD 

E 

CD 8 



O 

H- 

c 



ai 



3 

O 



o 

ERIC 




CM 



O 

CM 



m- Northwest Regional Educational Laboratory “ Overview Transparency #3 




Comparisons of Formative and Summative Evaluation 




CO 

CVJ ^ 




^8 — Northwest Regional Educational Laboratory _ — - — Overview Transparency #4 





Nine Comprehensive Components 




Northwest Regional Educational Laboratory Overview Transparency #5 




Program Outcome and Impact 



CO 

a; 

T3 

3 



E 

o 



ro 

to 

q; 

biD 

xs 

<D 



O 

c 



to 

<D 

biD 

c 

fO 

£ 

u 

a; 

•4-J 

fO 

<D 

E 

E 

to 

a; 

c 

E 

ro 

g O 

I > 

0/ 03 

0 o 

U 

1 = 

O 03 



■Q 

C3 






ci, 

■+-J 

= 2 
■s ^ 



o> 



vw 2 

S' ^ 

^ H-J 

Qj c: 

o C3 
Ci, 
C3 

F 'S' 

Si :S 

2 =3 

Ci, to 






8 

c 






to 






a; 
D.-S 
E .h: 

03 Si 

X :S 

CL) ^ 






C 

o 

+3 

c 

(D 

£ 

CD 



CD 

sz 

•4-J 

H- 

o 

to 

•4-J 

U 

O) 

it 

CD 



(D 

•4-J 

biO 

c 

o 



o 

LL. 



i £ 



U 

(0 

Q. 

E 



C 

0 

to 

E 

8 

t 

■o 

1 

1 

s 

I " 



CD ^ 



OJ 

Q. 



03 

X 

(D 

O 

LL. 



O 

ERIC 



T 

£N. 

c\^ 



CD 

C\i 



m— Northwest Regional Educational Laboratory — Overview Transparency #6 





(C 

0 

> 

O 



a. 

£ 

o 

u 

u 

(0 

o 

• 4 -> 

• 4 -> 

c 

(0 



o 

o 

u 

to 

X- 

3 

O 

> 

to 

0 

O 

TD 

•4-> 

(0 

x: 



n.. 

to 

0 

> 

‘■6 

0 

Xi 

O 

•o 

c 

03 

to 

03 

O 

&JD 

0 

to 

0 

SZ 

H-' 

0 

> 

0 
• wmm 

sz 

u 

03 



O 

•o 

O 

H-' 

0 

> 

03 

JZ 

o 

o 

x: 

u 

to 



03 

&JD 

C 

T3 

0 

0 

U 

U 

zs 

to 

to 



03 

bJD 

O 



f'- 
CL to 

•^ > 
£ S' 

^ o 

O TD 
C C 
^ 03 

E O 

U w) 



O ^ 

>^.E 
= ^ 

‘it 

O o 
X u 

. u 

■ 03 



03 



to 

to 

0 

bJD 

O 

X- 

CL 

0 

H-' 

03 

H-' 

to 

C 

o 

E 

0 

T3 



T3 

0 

X- 

0 

x: 

H-' 

03 

&JD 

0 

X 

q; n.. 
u to 

S to 

•S o 

!5 bJD 

> to 
0 L- 

= o 

> O 

> sz 

X ^ 

- O 

■ > 



n.. 

to 



bJD 

c 

S 

H-' 

0 

X- 

03 

03 

H-' 

03 

•o 

0 

X 

H-' 

H-' 

03 

X 

0 

C 

£ 

X- 

0 

H-' 

0 

•o 

o 

o 

X 

u 

to 



n.. 

to 



to 

0 

X- 

C 

o 

‘X 

03 

3 

03 

> 

0 

0 

to 



O 

O 

X 

u 

to 



O 



o 

X 



o 

X 



ERIC 




a: 

CM 



00 

(M 



(Pg> — Northwest Regional Educational Laboratory Overview Transparency #7 




Common Issues and Challenges 




m- Northwest Regional Educational Laboratory Overview Transparency #8 





Implementation Evaluation 



L et's imagine that your 
school has chosen a new 
comprehensive or schoolwide 
program. If your school is seri- 
ously committed to getting this 
program in place, then the im- 
plementation efforts cannot be 
left to chance. A process for 
verifying progress will have to 
be an integral part of the work. 
A critical initial step is to plan 
an evaluation that will trans- 
port detailed information about 
program implementation back 
to the program planners. This 
type of evaluation — collecting 
and using data to feed back 
into the program on an ongo- 
ing basis — is called formative 
evaluation. 

Formative evaluation serves 
two purposes: 

1. To determine whether the 
program is being implemented 
as Idle program developers de- 
signed it and that the most 
vital components of the pro- 
gram are in place 

2. To enable staff to retool and 
fine-tune their efforts to make 
a program work at a specific site 

A strong formative evaluation 
can help a program to "hum" at 
a particular school. 

The central question in for- 
mative evaluation is whether 
the model or program is being 
implemented as it was de- 
signed. Comprehensive models 
are grounded in research. But 
no program — no matter how 



sound it is — can have impact if 
its essential elements ^e not 
used. If some staff choose to 
use only a portion of a new 
program and to selectively aban- 
don other parts of the prograim, 
they weaken the impact of that 
program. This is why systematic 
data collection about implemen- 



tation is needed. By determining 
which progrcim components are 
firmly in place cuid which ones 
are only being given lip service, 
those managing the new pro- 
gram can learn about and ad- 
dress the barriers that are 
limiting or interfering with 
use. They can also design spe- 
cial adaptations to meet spe- 
cific needs of this school. 

In implementation evaluation, 
the data collected are used pri- 
marily for internal reporting to 
the program staff (although 
some grants do require that im- 
plementation data be reported 
to the funding agency’). To 
maximize the potential for pro- 
gram improvement, evaluation 
data about implementation 
must be analyzed quickly, 
shared broadly, and presented in 
a format that Ccui be easily used 
to make program modifications. 
Implementation evaluation 
works best when the evaluation 
is seen as an integral part of 
staff development. 



An important decision is to 
identify a team of individuals 
who can collect the implemen- 
tation evaluation data. The 
evaluation of a comprehensive 
program is best done by a team 
of data collectors. This team 
could include external evalua- 
tors, the administrators or staff 



in the building, and parents 
and community members. To 
be effective, members of that 
team need to be able to meet 
regularly with those implement- 
ing the program so there are 
clear lines of communication 
and a thorough understanding 
of the evaluation work. Step- 
by-step guidance for this team 
is presented in the following 
pages. Before getting into that 
level of detail, it is important 
to reiterate the key elements 
that research has shown to pre- 
dict successful implementation. 
Throughout this data collection, 
all involved in program imple- 
mentation need to be aware of 
these factors so they can gather 
evidence to verify that the nec- 
essary supporting conditions 
exist and that specific instruc- 
tional components are making 
it into the classroom to bolster 
the comprehensive reform. 



Formative or implementation evaluation is designed to provide data that will 
refine and improve a program. The purpose of doing such an evaluation is to 
gather adequate data to ensure that a program works in the local context. 



O 

ERIC 



1. Prior to developing an evaluation plan, program managers need to review the evaluation requirements stipulated by their funding agency. They should also 
determine how the data provided in reports will be used. Program managers need to be very clear about whether decisions about continued funding will be made 
based upon the reports they submit. 



32 



Implementation Evaluation 



4D 



Research shows that to be ef- 
fective, comprehensive reform 
needs to: 

■ Be undertaken for the right 
reasons (for example, to solve 
a problem, meet a need, or 
improve student achieve- 
ment), not simply to advance 
the career of an administrator 
or to procure additional funds. 

■ Nurture commitment on the 
part of teachers, preferably by 
involving them from the be- 
ginning in discussions of 
what and how to change. 

■ Provide adequate resources, 
including funds, materials, 
and — most important — time 
for teachers to learn, practice, 
reflect, discuss, observe, eval- 
uate, and assimilate. 

■ Include ongoing profes- 
sional development for teach- 
ers, not depend on a one-shot 
training workshop at the be- 
ginning of implementation. 
Training and coaching should 
be ongoing and should sup- 
port the change of classroom 
practice. 

■ Promote collaboration 
among teachers so they can 
learn from each other and 
help each other work through 
the most difficult aspects of 
change. 

■ Exert pressure on teachers 
who are resistant to change 
and develop approaches that 
channel resistance into pro- 
ductive dialogue. To prevent 
resentment and passive resis- 
tance, this pressure must be 
counterbalanced by continu- 
ous support. 

■ Enable staff to try new and 
messy changes by allowing 
them to make mistakes and 
encouraging them to make 
midcourse adjustments. 




■ Involve parents and com- 
munity members in the re- 
form process. 

■ Ensure that school and dis- 
trict leaders support the 
change in word and deed. 

■ Minimize conflicts with 
other innovations, programs, 
and policies. 

■ Incorporate successful in- 
novations into district policy 
and budgets so that they will 
outlast the inevitable depar- 
ture of key leaders or start- 
up funding (Buechler, 1997). 

One aspect of the implemen- 
tation evaluation is to deter- 
mine if these basic conditions 
are being met. 

Considerations 
in Planning Your 
Implementation 
Evaluation 

T he degree and depth of im- 
plementation evaluation a 
school is able to undertake de- 
pends on two pragmatic fac- 
tors: amount of funding and 
access to data. Hiring an exter- 
nal evaluator is an excellent 
way to get this work done but 
since implementation evalua- 
tion can be very time inten- 
sive, contracting with an 
outside consultant to do all 
data collection work may be 
more costly than most projects 
can afford. In addition, physi- 
cal distance from the day-to- 
day operations may restrict the 
amount of detailed information 
that an external evaluator can 
collect. For these reasons, many 
programs use a combination of 
external and internal staff to 
collect data. 

33 



In planning this evaluation 
it is also essential for the pro- 
gram managers to carefully 
study grant requirements to de- 
termine the type of evaluation 
data which is required and to 
know how this data will be 
used. Most funders require that 
outcome data be reported, so it 
is easy for a school to be fo- 
cused exclusively on this type 
of data. Schools must be care- 
ful not to become focused ex- 
clusively on end results, to the 
detriment of ongoing measures 
of implementation. Implemen- 
tation measures are critical to 
achieving the long-term results 
schools seek. 



Priorities of the preparatory 
work are: 

■ Addressing staff misunderstand- 
ings about program evaluation 

■ Getting staff connected to the 
evaluation work so that they can 
participate in question generation 
and data collection 



Preparation for 
Evaluation 

T his section describes con- 
crete ways that school staff 
can get more engaged in the 
process of posing evaluation 
questions and identifying how 
data will affect program imple- 
mentation. These steps will 
help guide the initial work of 
the implementation evaluation: 

Step 1: Orient the entire staff 
to evaluation issues as early as 
possible. The primary source of 
information for implementation 
evaluation is likely to be front- 
line staff — those who are work- 
ing to put this program into 
place. Since these individuals 



will be supplying information, 
it is crucial that they under- 
stand the purpose of evalua- 
tion and are willing to help 
collect data. Those collecting 
the evaluation data need to 
make sure that all participants 
have been informed about the 
purpose of the evaluation and 
are willing to be cooperative. 

If those collecting evaluation 
data already have the trust 
of the staff, they may want 
to proceed to Step 2. If not, 
we suggest preparatory work 
(described below) to ensure 
that staff are able and willing 
to cooperate fully and provide 
the best information possible 
about program implementation. 

Those conducting the evalua- 
tion need to address any mis- 
understandings or reservations 
staff may hold about the process 
of evaluation. When personal 
concerns about evaluation have 
been discussed, staff will be 
more willing to provide honest 
data. The following issues often 
crop up:' 

■ Staff equate program evalu- 
ation with personnel evalua- 
tion. When a program is being 
evaluated, staff can take this 
very personally. They may feel 
that it is they who are being 
critiqued and this puts them 
on the defensive. One way to 
address this is to explain the 
difference between studying 
individual performance and 
examining the complex sys- 
tem in which a program oper- 
ates. Individuals operating 
alone in a complex system 
benefit from a better under- 
standing of how systemwide 



change happens. The evalua- 
tor can help the staff under- 
stand that for any program to 
work, all people involved need 
to get beyond assigning blame 
and join together to address 
the big issues. 



Several issues about the 
evaluation process should be 
clarified with program staff. 
The intention is to raise staff 
awareness of the usefulness and 
power of the evaluation work 
being done at the school. 



■ Some staff believe that 
evaluation data will be used 
exclusively to decide if a pro- 
gram will be refunded. Natu- 
rally, they are reluctant to 
reveal any problems, con- 
cerns, or weaknesses if they 
think that making such infor- 
mation public will mean the 
elimination of program funds. 
Staff need to understand that 
the purpose of implementa- 
tion evaluation is program 
improvement, not funding 
decisionmaking. Being clear 
about how specific informa- 
tion will be used is essential. 

■ Staff may believe that eval- 
uation needs to be done by 
an impartial observer. They 
may think they should keep 
their distance to avoid "con- 
taminating" the data. The 
evaluator needs to stress the 
importance of staff involve- 
ment and participation in 
evaluation. . 

■ If external evaluators par- 
ticipate in this data collec- 
tion, they need to clarify 
their own role and function. 
Those collecting the data 



need to explain that they en- 
vision the evaluation process 
as a way to learn, rather than 
as a chance to criticize. 

Research has shown that 
staff cooperation and under- 
standing help a school use for- 
mative evaluation to improve 
its implementation of compre- 
hensive reform. To make that 
happen, the following points 
about evaluation need to be 
explained at staff meetings:’ 

1. Evaluation should be planned 
early. The earlier the data are 
collected, the more likely those 
data can be used during the 
course of the program. 

2. The evaluation must include 
multiple perspectives such as 
ideas from school staff, from 
the district offices, and from 
the reform model trainers 
working with the school. 

3. This program does not oper- 
ate in isolation from the larger 
context of the school. To ensure 
that the evaluation tackles the 
background or contextual issues, 
the evaluation process needs to 
examine the supportiveness of 
school culture and district poli- 
cies for schoolwide reform. Staff 
should be aware that evalua- 
tion work may include reviews 
of other programs in the school 
to see where and how multiple 
programs overlap. 

4. The evaluation will be look- 
ing at how staff development is 
incorporated into classroom in- 
struction and management. This 
will mean classroom visits to 
monitor and assess program 



2. A summary of these issues is available in Implementation Transparency #1. 

3. These ideas are summarized in Implementation Transparency #2. 



34 



Implementation Evaluation 



implementation. Those working 
on the evaluation should reas- 
sure staff that data about the 
work of individual teachers will 
be kept confidential. They 
should also stress that differ- 
ent rates of implementation 
across classrooms are natural. 

5. Feedback from data collected 
will be provided to school staff 
as quickly as possible. 

6. The same data will be col- 
lected repeatedly so that the 
school can assess progress. This 
means that when the school 
selects a data collection tool, it 
is making a commitment to use 
that instrument several 
times — either during the 
school year or for several years 
in a row. With this in mind, in- 
strument selection needs to be 
done carefully and thoughtfully. 

7. Much can be learned when 
the school's progress is com- 
pared to other reform efforts or 
national norms. To make such 
comparisons, schools may need 
to use measures that have been 
used in other settings. 

Step 2: Initiate data collec- 
tion and promote ongoing di- 
alogue about evaluation at 
staff meetings and all meet- 
ings with parents and com- 
munity members. Introduce 
evaluation concepts at staff 
and community meetings and 
take this opportunity to col- 
lect attitude and belief data. 
Four ideas for doing this are 
provided in the "Presenter's 
Guide and Training Materials." 
These activities can be used 
to spark staff and community 
conversations about the re- 
form model. They also help 
evaluation planners under- 



O 




The RAND Study found that: 

■ Only 57 percent of the teachers could identify which model was being used 
in their school 

■ 27 percent felt they could explain the model's philosophy to others 

■ 44 percent were unclear about success criteria (how their new program 
would be judged) 

■ 38 percent felt that lack of success would lead to termination of the program 

■ 22 percent felt that their personal efforts would affect the success of the 
design 

■ 23 percent said they had strayed from the design (within certain designs, 
this was as high as 53 percent) 



stand the overall context for 
program implementation. The 
questions outlined in each ac- 
tivity can be adapted to each 
site and used to collect forma- 
tive evaluation evidence at the 
beginning of any implementa- 
tion process. 

The evaluator can demonstrate 
possible pitfalls in implementa- 
tion by citing the results from 
the recent RAND evaluation 
(Bodilly, Keltner, Purnell, 
Reichardt, & Schuyler, 1998), 
which documented a number 
of schools' efforts at schoolwide 
reform. The study focused on 
the Cincinnati School District, 
where three different models 
were implemented and sup- 
ported by the district. When 
teachers were surveyed at the 
end of Year One about their new 
program, it was clear that many 
teachers who were supposed to 
be implementing the model 
were still uncertain about 
the work they were doing. 



35 



If the school district had 
known about teachers' lack of 
knowledge earlier in the year, it 
would have been able to remedy 
some of these implementation 
issues. This is where implemen- 
tation evaluation can be helpful. 

Step 3: Discuss the way pro- 
gram implementation is likely 
to happen in schools. It is at 

this point that the evaluation 
identifies key components of 
the selected model along with 
an expected timeline for the 
process to take hold in the 
school. Research at schools 
that have put comprehensive 
efforts into place has shown 
that one of the major road- 
blocks to the success of any 
program is getting the program 
widely and consistently used 
by staff around the school. 

Before the evaluator can 
begin to collect information 
about implementation, the 
school will benefit from some 
common understandings about 
the stages staff typically go 
through to implement a new 
program. This discussion is 
likely to be most productive 
when grounded in a research- 
based theoretical framework — 
that is, when the staff has a 




common vocabulary based 
upon research-proven con- 
cepts. Such a framework can 
promote meaningful dialogue 
about evaluation. Using a frame- 
work increases communication 
about both evaluation and im- 
plementation. One framework 
that works well is Levels of 



Use, developed by Shirley 
Hord and her colleagues (Hord, 
Rutherford, Huling-Austin, & 
Hall, 1988). A brief description 
of the levels that staff mem- 
bers go through as they work 
with an innovative program is 
explained in the box below.“ 



Additional information on 
how schoolwide information 
about Levels of Use can be 
summarized is available in Im- 
plementation Evaluation Trans- 
parency #5 and Implementation 
Evaluation Handout #2. o 

Developing Evaluation 
Questions 

O nce the preparatory work is 
done, schools should consider 
what kinds of information would 
help ensure complete program 
implementation. To do that, 
schools need to learn more about 
existing conditions. They should 
collect baseline information 
that paints a clear picture of 
the pace and scope of change 
taking place in the school. 



The levels of use Hord describes are as follows: 

Non-Use: Teacher has little or no knowledge of the new approach, no 
involvement with it, and is doing nothing toward becoming involved. 

Orientation: Teacher is acquiring information about the new approach 
and/or has explored its value and its orientation, what it will require. 

Preparation: Teacher is preparing for first use of the innovation. 

Mechanical use: Teacher starts to use the new approach but focuses her 
or his effort on the short-term, day-to-day use of the innovation with little 
time for reflection; use is disjointed and superficial. 

Routine: Teacher use is stabilized. Few if any changes are being made in 
ongoing use. Teacher no longer needs to prepare or give additional thought 
to use this approach. Time is not spent improving the approach or 
identifying its consequences. 

Refinement: Teacher varies the approach to increase impact. Teacher 
examines both short- and long-term consequences to learn more about 
what works best. Use of this approach is based on input from (and in 
coordination with) colleagues. It is at this point that the primary focus 
becomes benefiting students. 

Integration: Teacher uses approach with related activities to achieve a 
collective impact on students. Teacher explores major modifications of 
the approach to ensure maximum benefit. 

Renewal: User moves toward a new approach. 



4. These concepts can be introduced to the staff using Implementation Evaluation Transparency ttS, Levels of Use Related to Instructional Implementation. 
^ Implementation Evaluation Transparency #4 illustrates how the level of staff use can be assessed through a series of simple questions. 

ERIC 



:ib 



Implementation Evaluation 




Review Existing Data 

T hose working on evaluation 
should start with a review of 
descriptive information about 
the school. This would include 
brief descriptions of program 
participants, an overview of 
the plan and goals for the 
comprehensive program, and 
contextual information. Much 
of this information can usually 
be pulled from a grant applica- 
tion, but it may need updating 
and further specification. 

Decide What Additional 
Data To Collect 

t this point, schools will 
begin developing research 
questions. The questions are 
written for two purposes: first, 
to explore concerns or issues, 
and second, to confirm hypo- 
theses or troubleshoot problems. 
There is no set of generic ques- 
tions that will work for all pro- 
grams. Unique questions need 
to be written for each program 
to focus the data collection on: 

■ The type of program being 
implemented 

■ What the school is trying to 
accomplish 

■ Specific contextual issues 
facing the school 

Sample implementation eval- 
uation questions are shown in 
the box at the top of this page.^ 

While there are no magical 
questions that will work in all 
situations, there are criteria 
that can be applied to deter- 
mine if the questions chosen 
will be useful in guiding the 



Are staff members knowledgeable 
about comprehensive changes 
required by the reform model being 
implemented? 

Do staff members demonstrate a 
commitment to the needed training? 

Is the program being implemented 
as it was designed? 

Are staff using the new instructional 
practices that were taught to them 
during inservice sessions? 



evaluation design. Questions 
should be: 

■ Clear 

■ Specific 

■ Pertinent to essential as- 
pects or components of im- 
plementation of this program 

■ Focused on a manageable 
set of issues 

The wording of these ques- 
tions is a very important part 
of the process of designing an 
evaluation. How these ques- 
tions are stated will have im- 



plications for the kinds of 
data that will be collected, 
the sources of the data, and 
the analyses that will be done 
on the data. Ultimately, the 
way the questions are worded 
will affect the kinds of con- 
clusions that can be drawn 
about the program. 



Here is an example. Suppose 
your implementation evaluation 
is geared to find out how clear 
the new program is to teachers. 
This issue of clarity can be ad- 
dressed in several different eval- 
uation questions. Here are two 
possible question formulations: 

■ Do the participating teach- 
ers have a clear understcind- 
ing of the purpose and goals 
of the program? 

■ Have criteria been estab- 
lished to determine if the 
program is clear enough to 
the teachers so that they 
can implement it? 

The data collection approach 
differs dramatically depending 
on which of these questions is 
chosen. For question #1, the 
evaluator would collect data 
from the teachers themselves 
to determine their understand- 
ing. But with question #2, the 
evaluator would be more likely 
to turn first to the program de- 
veloper and to written docu- 



mentation to learn if criteria 
existed and then, using this in- 
formation, would design cin in- 
strument to be used with the 
teaching staff. 




How questions are generated is very important. Schools should carefully 
consider who should be involved and what resources they should use. Without 
a doubt the best evaluation work is done when multiple perspectives are taken 
into account. While staff may formulate a set of initial questions, many other 
stakeholders should have an opportunity to provide input. This will increase 
ownership and participation in the evaluation and increase the likelihood that 
evaluation results are used. 



5, These are displayed on Implementation Evaluation Transparency #6, 



37 



So how are these evaluation 
questions developed? One strat- 
egy is to interview program 
staff and then use their input 
to propose several evaluation 
questions for staff review. An- 
other approach is to hold a 
meeting with staff to talk 
about the work that will be 
done throughout the school 
and then ask the staff to list 
their concerns about the pro- 
gram. This information can then 
be shaped into evaluation ques- 
tions. Evaluation questions can 
also flow from an understanding 
of the factors that are most 
likely to help and hinder im- 
plementation. These are sum- 
marized in Implementation 
Evaluation #7. Remember that 
evidence for the evaluation can 
take many forms, but that the 
data collected must be relevant 
to program improvement deci- 
sions. Evaluators should ask 
themselves, "If staff knew the 
answer to this specific question, 
how could or would they act 
with this information?" Certain 
types of information, while in- 
teresting, may not help the 
staff to make changes. So the 
useful rule of thumb is to de- 
termine which data are most 
needed to conect or fine-tune 
a program. 

Evaluators can also collect data 
related to factors that may be 
preventing program implementa- 
tion, along with some documen- 
tation of ways that these barriers 
are being addressed. A simple 
form for this type of documen- 
tation is shown in Implementa- 
tion Evaluation Handout #3. 
Program mangers are encouraged 
to plan intermittent review of 
such barriers to learn if adequate 
support is being provided for 
program implementation. 



Because comprehensive re- 
form is complex, it is important 
not to narrow down the data 
collection too early. Also, it is 
best to save all data collected. 
While some data may not seem 
immediately relevant, new issues 
may emerge during the course 
of the analysis phase, or pro- 
gram priorities may change. 

Planning the Evaluation 

S tep 1: Work closely vnth 
the planning team and 
with the professional develop- 
ers who are presenting train- 
ing related to the school's 
comprehensive model. Know- 
ing what staff will be learning 
and when they will be learning 
it is a crucial part of the im- 
plementation evaluation. In 
addition to the actual staff 
development days, there may 
be follow-up meetings and/ or 
a series of benchmarks that es- 
tablish the timeline for imple- 
mentation. Staff need to be 
intimately familiar with this 
schedule and to use this infor- 
mation in evaluation design 
and measurement selection. 

From the beginning, the evalu- 
ation must be structured around: 

■ The schedule of training 
events 

■ Key information that will 
be provided at each profes- 
sional development event or 
meeting 

■ Likely stages of implemen- 
tation (including information 
about typical variability 
among the staff in the pace 
of implementation) 



M 



Once this information has 
been gathered, it is time to 
sketch out the data collection 
design. 



Professional developers are 
often excellent sources of detailed 
information about implementation 
of their model. They can provide 
information about other schools' 
experiences with the model and 
about problems that may crop up. 
They can describe program 
idiosyncrasies, such as whether 
teachers in certain grade levels 
are most likely to implement the 
program; whether certain trainings 
need repetition and support before 
teachers will adopt the approach; 
or what level of staff preparedness 
and support is needed for full 
implementation. A conversation 
with the professional development 
team can provide solid background 
for the evaluation plan. 



Step 2: Design a matrix that 
lists the kinds of data that 
would answer the research 
questions and that pinpoints 
the best time to collect each 
kind of data. There are several 
things to consider in the de- 
sign of the matrix: (1) how to 
ensure that you have adequate 
information, (2) how data col- 
lection will be conducted, and 
(3) when and where the data 
collection activities will occur. 
One of your goals will be to gath- 
er data from enough sources to 
provide balanced information. 

This is the time to consider a 
variety of data-gathering strat- 
egies. When you are deciding 
which data to collect and how, 
there will be pragmatic consid- 
erations: 



Implementation Evaluation 



■ The value these data have 
as evidence 

■ The cost to collect them 

■ The amount of intrusion 
into school routines 

■ Any ethical considerations 
or constraints being placed 
on the evaluation 



Implementation Evaluation 
Transparency #11 illustrates 
the development of a matrix 
demonstrating data collection 
procedures for an elementary 
school. Once a matrix has been 
completed for your school, the 
matrix will serve as a visual 



To ensure uneimbiguous inter- 
pretation of data, it is impor- 
tant to pretest the items — that 
is, try them out with a number 
of staff members. Questions 
should elidt complete answers 
that directly address your 
questions. 



Throughout these early stages of implementation evaluation, evaluators 
should keep the following key points in mind: 

1. Encourage continuous reflection and thinking about the reform process. 

2. Recognize there is no one-size-fits-all comprehensive reform model. 
Help staff realize that any reform model needs to be adapted for use at 
each school, and that input from staff is imperative. To ensure that pro- 
gress is made, evaluation planning needs to include a timeline of events 
or activities as well as a description of what teachers are expected to 
implement during the year. 

3. Inform the staff that for a school reform effort to be comprehensive, it 
needs full participation from a broad base of school community members. 
Including a greater number of stakeholders in evaluation planning 
encourages greater participation in the reform. 



There are other considera- 
tions as well. One is how to 
communicate information 
about the evaluation to all par- 
ticipants. Duration of data col- 
lection, as well as coding and 
storage of data, are other con- 
cerns that will affect staff and 
program design. 

A number of examples of data 
collection procedures are shown 
on Implementation Evaluation 
Transparencies 8, 9, and 10. Re- 
viewing these ideas will provide 
some examples of procedures 
that are often used in implemen- 
tation evaluation. Obviously the 
design developed for each school 
will need to consider the size 
of the school, the amount of 
time that staff have available 
for interviews, the structure of 
faculty meetings, and the time- 
line established for professiorial 
development. 



representation of the evalua- 
tion design. It can serve as 
both road map (to show where 
the evaluation is headed) and 
timeline (to keep the data col- 
lection on schedule). 

Step 3: Select tools that will 
provide you with answers to 
your evaluation questions. Be 
sure to consider a variety of 
data collection tools. In the 
selection of data collection 
tools, staff gathering data 
should keep several consid- 
erations in mind: 

■ Balance 

■ Validity and reliability 

■ Participant perceptions 

It is often cost-effective to 
use preexisting instruments. 
These should be reviewed to 
make sure they are relevant 
to the school's needs. 



To ensure practicality of de- 
sign, schedule time not only 
for the data collection but also 
for the analyses and reporting 
of data. A general rule of thumb 
is that it takes one and a half 
to two times as much time to 
analyze the data as it does to 
collect them. It is also impor- 
tant to choose approaches that 
are simple enough to complete 
within the time available. If the 
evaluation has four days of data 
collection time available, for 
example, it will be impossible 
to schedule three days of inter- 
views along with two days of 
focus-group meetings. 

Collecting Evaluation Data 

D ata collection can include 
information about many 
components of a comprehen- 
sive program such as: 

■ Professional development 
activities 

■ Parental involvement 
■ External technical support 
and assistance 

When collecting data, staff 
members need to accurately 
record what they see and hear 
and avoid making judgments. 
They should concentrate on 
recording observations or con- 
versations in an objective way. 
To capture the information as 
cleanly as possible, the evalua- 
^tion should include the devel- 
O aopment of data collection 



guides— forms providing ques- 
tions and space for recording 
verbatim notes from interviews 
or classroom activities. 

Data collectors should encour- 
age reflective thinking by: 

■ Using wait time 

■ Keeping good eye contact 

■ Asking staff to explain 
their comments or to provide 
specific examples or anecdotes 

Minimizing Bias 

B ias is always an issue in 
data collection. To avoid 
getting a biased view of the 
program, data collectors need 
to ensure they are getting a 
broad representation of views. 
Therefore, it is essential to ran- 
domly select individuals to in- 
terview or observe, but at the 
same time to make sure that all 
key groups are included in your 
sample. Here is how this works. 
Suppose that one key aspect 
(variable) under study is how 
well teachers at grade level are 
implementing the new pro- 
gram. To avoid disruption, the 
evaluation could just ask for 
teachers to volunteer to partic- 
ipate in an interview, but then 
researchers would only get the 



Bias is the personal and unreasoned 
distortion of judgment. Bias is 
evident when conclusions are 
reached, not based upon facts, but 
instead because those analyzing the 
data already have certain viewpoints 
or perspectives. 



Staff members who already had 
a reason to share their perspec- 
tive. To avoid bias in this case, 
while at the same time ensur- 
ing that each grade level is rep- 



resented, the evaluation design 
should call for the random selec- 
tion of one teacher from each 
grade level. 

Those participating in the 
program are most likely to view 
the results as biased when an 
evaluation is unduly influenced 
by, disrupts, or threatens ongo- 
ing social and institutional re- 
lationships. If informants have 
a reason to distrust the evalua- 
tion process, they may appear 
helpM but can be withholding 
or shaping information out of 
self-interest. 

To reduce the effects of bias 
during data collection: 

1. Use unobtrusive measures 
whenever possible. 

2. Make sure the purpose is 
completely clear to informants. 



4. Triangulate (checking your 
research question[s] against 
other already validated mea- 
sures) with several collection 
methods. 

5. If you sense you are being 
misled, focus on why. 

6. Show field notes to an out- 
side reader (without breaking 
confidentiality). 

7. Keep your research questions 
firmly in mind. 

Analyzing and 
Interpreting the Data 

O nce the data have been 
collected, the school staff 
must make sense of them. Mean- 
ing will emerge from analysis 
that is both systematic and 
thoughtful. The analysis re- 
quires blending technical skills 



Reviewing the data and generating hypotheses about what they say may be the job 
of a small group. But getting a complete understanding of the underlying meaning 
often becomes a whole-group task. Structuring meeting time to encourage group 
input provides multiple perspectives while at the same time providing immediate 
feedback to a large number of stakeholders. 

In particular, be sure to include those who spend their days implementing the 
program in any data interpretation activities. When staff members work with 
the data, they become familiar and comfortable with them. Making the findings 
more accessible to the staff increases the likelihood the results will be used. 



Make certain they have a copy 
of your research questions, re- 
mind them why the evaluation 
is being done, and tell them 
what you will do with the in- 
formation. This builds trust. 

3. Include dissidents and "cranks" 
to achieve a balanced picture. 



to organize data quantitatively 
with intuitive skills to tease 
out the messages that may lie 
hidden behind the responses of 
individuals. 

While the choice of analysis 
method depends on the type of 
information and the purpose of 
the analysis, the summary that 
emerges needs to describe ei- 
ther quantitative (percentages, 
averages) or qualitative (de- 






Implementation Evaluation 



scription of themes that emerge 
from the reader's point of view) 
information. 

It's best to orgcinize the data 
for each research question sepa- 
rately. This ensures that data 
addressing one question can 
be examined without contami- 
nation from data addressing 
other questions. By looking at 
all the data related to one ques- 
tion, data analyzers can deter- 
mine if the data support one 
conclusion, or if in fact there 
are various perspectives. When 
reviewing the data, the evalua- 
tion should look at the "big pic- 
ture," as well as smaller themes 
that surface. Once the data have 
been examined for each ques- 
tion, the analysis should expand 
its focus to include the full data 
collection. The staff may then 
begin to see a pattern of issues 
that touch multiple aspects of 
the program. 

The next question to ask is 
what these data say about the 
path the program should take. 



Knowing what decisions are to 
be made and by whom will help 
determine the best way to con- 
duct a secondary analysis of the 
data. If the staff wants to know 
how much time teachers at var- 
ious grade levels need to cover 
certain material, then collecting 
unit completion information 
from every classroom teacher 
would prove most useful. If the 
principal wants to know whether 
teachers are adapting program 
components to provide different 
instruction for separate groups 
inside a classroom then the 
within-classroom variability 
data should be disaggregated 
to isolate findings for those 
subgroups. 

Reporting the Data 

I t is likely that funding agen- 
cies will require some type 
of written report in a format 
that is useful to them, but ad- 
dressing your report to their 
requirements alone leaves im- 
portant work undone. Sending 
that evaluation report off to 



the funder's files will not im- 
prove your program. Instead, 
the findings of the evaluation 
need to be portrayed thought- 
fully in a way that will com- 
municate with the staff of this 
school. In the case of imple- 
mentation evaluation, a sum- 
mary of the results along with 
some help interpreting the data 
is of utmost importance. This is 
best done in a combination of 
oral and written reports. Since 
those implementing the pro- 
gram are busy people, it is im- 
portant to keep both types of 
reports short, allowing time in- 
stead for the users to discuss 
the reports' implications for 
their day-to-day work. 

Reporting to school staff 
should reflect the concerns 
of the audience. What are they 
worried about? What information 
do they need to tackle their most 
pressing concerns? The informa- 
tion should be presented in 
language the audience can 
relate to and understand. 



Evaluation findings should be shared at both school and community meetings. To make the presentation of the 
data more accessible and interesting, the presenter should: 

■ Get the audience involved by giving them a brief warm-up activity. 

■ Try to talk with, not at, the audience. 

■ Use conversational language and avoid technical words. 

■ Present the data in creative formats that will engage the audience. Use graphs and charts to make the presentation 
of information as visual as possible. 

■ Punctuate the presentation with audience questions that will encourage the program implementors to reflect on 
the data. 

■ Place nothing between presenter and audience. Don't stand behind a lectern. If possible, mingle vjrith the audience. 

■ Use the names of the partidpants whenever possible, and encourage them to interact with one another. 

■ Smile and look relaxed. 

■ Use humor whenever possible. 

■ Use personal anecdotes and stories. These give the audience something to relate to and bring the presentation 

down to earth. /I 1 



Timing is vital in the prepa- 
ration of both verbal and writ- 
ten reports. Those generating 
a report need to know the pro- 
gram schedule. For example, 
when will the planners hold 
their meetings? When will staff 
development take place? When 
is staff likely to make program 
adaptation? Reports should 
provide enough detail to enable 
the staff to make midcourse 
corrections. The evaluation re- 
porting cannot wait until the 
end of a year or the completion 
of the project. 

The number and type of 
groups that will receive the 
information are also crucial 
considerations. Ideally, find- 
ings should be shared with 
anyone who participates in 
this program. Whenever possi- 
ble, information should go to 
staff, students, parents, and 
the community. Sometimes it 
is useful to share information 
in several formats for the dif- 
ferent audiences. 

There are a number of ways to 
present findings, the most com- 
mon being a written report. 
Such reports can vary greatly 
in style, depending on the au- 
dience. Style options include 
journalistic summary, dialogue, 
testimony, question and answer, 
or scenario. Certain kinds of 
data may best be presented in 
a graph or chart, case studies, 
panel discussions, or simulations. 

Presentation method and style 
should be tailored to the audi- 
ence and their intentions — that 
is, who will receive the report 
and how they will use it. Whiie 
the formal report may take 
longer, a draft of several key 
findings could be completed 



and distributed very quickly. 

For some audiences, small seg- 
ments of findings doled out a 
bit at a time or a streamlined 
version of overall findings may 
suffice. But those who are work- 
ing to implement the compre- 
hensive program will benefit 
most from a report that is rich 
in descriptive detail. 

School staffs are most likely 
to use the findings if: 

■ They have been closely as- 
sociated with the evaluation 
effort 

■ They have a long-standing 
commitment to the use of 
data 

■ Conclusions are presented 
in a straightforward, under- 
standable way 

■ They receive the information 
at the time they need it 

■ Evaluators share their ideas 
in draft form, solicit feed- 
back, and make revisions 

If possible, the written report 
should include comments and 
quotes from staff and/or stu- 
dents to make it more engaging 
to teachers. Staff or student 
comments lend credibility to 
the findings and give the infor- 
mation a human dimension. 

Using Data To Make 
Program Improvements 

T o ensure that the data will 
be used, the evaluation ef- 
fort also needs to include ways 
to facilitate discussion with de- 
cisionmakers about the steps 
they will take to put the data 
into action. These, ideas should 
be included in a school im- 
provement plan that lays out 
• strategies to strengthen instruc- 



tional practices. The plan should 
be clear about what teachers 
are expected to do, include 
activities that are an integral 
part of daily instruction for all 
teachers, and ensure that teach- 
ers have or develop the skills to 
implement changes. 

Once the leadership team or 
a steering committee has the 
data in hand, take these steps: 

■ Review the strategies and 
action steps originally pro- 
posed in your grant applica- 
tion or school improvement 
plan. Identify who was re- 
sponsible for implementation 
and ascertain how far along 
the school was supposed to be 
at the time of data collection. 

■ Use the data to identify the 
parts of the plan or programs 
that cire not being imple- 
mented and other challenges 
facing staff. 

■ Make sure staff are aware 
of the findings and then ask 
what else could be done to 
help your school make 
changes. 

■ With staff input, determine 
what additional training is 
needed to improve the im- 
plementation process. Decide 
what kind of staff develop- 
ment can get this done. 

■ Determine if new materials 
are needed and how they will 
be purchased or developed. 

■ Determine how to provide 
ongoing support to sustain 
implementation of the plan. 

■ Determine what added 
resources are needed to im- 
plement the revised improve- 
ment plan and how they will 
be obtained. 



42 



Implementation Evaluation 



■ Reestablish responsibilities 
and timelines needed to im- 
plement the revised plan. 

■ Communicate what has 
been incorporated into the 
plcUi to all staff, and ask all 
staff to take action. 

■ Review the implementation 
evaluation design. Make 
changes as needed to gather 
data reflecting the modifica- 
tions. 

Reflections About 
the Implementation 
Evaluation Work 

T he process of doing imple- 
mentation evaluation may 
often seem paradoxical for 
those involved in program im- 
plementation. In most schools, 
a mix of insiders and outsiders 
is likely to be involved in the 
evaluation. While this can 
strengthen the process, it also 
adds to the complexity. Often, 
the insiders work closely with 
those implementing the com- 
prehensive program, while out- 
siders bring the perspective of 
impartial observers of change. 
The combination of these two 
perspectives adds richness to 
the process but also requires 
openness and sensitivity about 
the working relationships be- 
tween the groups. 



In addition, members of the 
evaluation team need to take on 
different roles at various times. 
Data collectors are asked to be 
equally comfortable talking with 
those in authority and those who 
have very Uttle formal power. 
They must recognize that com- 
prehensive progreuns need the 
input of both groups if they are 
to succeed. When deteriiiining if 
implementation is occurring, 
data collectors need to be gen- 
uinely invisible, quietly watch- 
ing. But when the time comes to 
communicate results, the same 
individuals need to be highly vis- 
ible, sharing important informa- 
tion cUid explaining the findings. 

For all these reasons, selecting 
who to serve on your evaluation 
tecun is an important decision. 
And determining who should 
report the results is also a criti- 
cal decision. Presenting results 
can at times bring out tensions 
between two opposed groups: 
those who are working to get 
a new program in place and 
those resisting or struggling. 
The sensitive presentation of 
formative evaluation data has 
the potential to open the lines 
of communication between 
these two. 



Summary 

I mplementation evaluation is 
a way to assess the work be- 
tween program planning and 
program impact. Planning and 
conducting a formative evalu- 
ation is an ambitious project 
since it requires data collection 
that reaches into the classroom. 
To target the evaluation, it is 
necessary to develop specific 
evaluation questions, identify 
the most appropriate sources for 
the data needed, organize to get 
broad participation in the data 
collection and analysis process, 
and determine the best time 
and place to summarize the 
data and report the findings. 
Conducting an effective imple- 
mentation evaluation means 
keeping in close contact with 
the implementation process and 
the staff members who are mak- 
ing this program a reality. 



Resources 



Beyer, B.K. (1995). How to conduct a formative evaluation. Alexan- 
dria, VA: Association for Supervision and Curriculum Development. 

This book describes how to conduct a formative evaluation of edu- 
cational programs by assessing the program during various stages 
of its development. The author provides practical checklists, data- 
collection instruments, and other resources to assist in conducting 
the evaluation. 

Herman, J.L, & Winters, L. (1992). Tracking your school's success: 

A guide to sensible evaluation. Newbury Park, CA: Corwin Press. 

This comprehensive guide offers educators step-by-step procedures 
and practical guidance needed to conduct sensible assessments and 
evaluations, and record and measure progress. It also instructs the 
reader on how to use evaluation information to aid in school plan- 
ning and improve management decisions. 

King, J.A., Morris, L.L., & Fitz-Gibbon, C.T. (1987). How to assess 
program implementation. Newbury Park, CA: Sage. 

This book is one component of the Sage Publications series. The 
Program Evaluation Kit, a set of guidebooks written to guide and 
assist program evaluators in planning and managing evaluations. 
The guide will help practitioners plan an evaluation of program 
implementation and design, and use appropriate instruments for 
generating data to support the plan. Procedures in the "how to" 
sections of the book are presented step by step to give maximum 
practical advice. 

References 

Bodilly, S.J., Keltner, B., Purnell, S.W., Reichardt, R.E, & Schuyler, 
G.L. (1998). Lessons from New American Schoob' scale-up phase: 
Prospects for bringing designs to multiple schoob. Santa Monica, 
CA: Rand. 



Instructions for 
Implementation 
Evaluation 

Transparendes 

T he transparencies in the 
Overview section provided 
background information on the 
issue of formative or implemen- 
tation evaluation, including an 
outline of the purpose of this 
type of evaluation (Overview 
Evaluation #2), comparisons 
of formative and summative 
evaluation (Overview Evalua- 
tion #4), and generic formative 
evaluation questions (Overview 
Evaluation #7). Each Implemen- 
tation Evaluation transparency 
discusses issues that arise early 
in the evaluation process as 
formative evaluation design 
is being generated. 

Transparency #1 

Outlines several areas of misun- 
derstanding that staff can have 
about the evaluation process. 
Because these can undermine 
data collection during formative 
evaluation, the evaluator might 
use this transparency to initiate 
a brief discussion with staff to 
clarify any misconceptions. 

Transparency #2 



Buechler, M. (1997). Scaling up: The role of national networks in 
spreading education reform. Unpublished manuscript. Portland, 
OR: Northwest Regional Educational Laboratory. 

Hord, S.M., Rutherford, W.L., Huling-Austin, L., & Hall, G.E. (1987). 
Taking charge of change. Alexandria, VA: Association for Supervi- 
sion and Curriculum Development. 



Lists advice to those who will 
be planning and conducting 
implementation evaluation for 
a comprehensive program. 

Transparency #3 

Summarizes the Levels of Use 
framework, which shows that 
staff move through a number 
of phases before they can ef- 
fectively use a new approach. 
However, their progression 



44 



Implementation Evaluation 



through these levels of use is 
not uniform, and without sup- 
port many of the staff will not 
make it all the way around the 
circle. Many staff struggle with 
the mechanical use of a new 
program. Then, because they 
lack additional support and 
encouragement, they drift into 
routine use of the approach. 
Explain to the group that it is 
really in the refinement phase 
of implementation that student 
benefits are noted. (The hand- 
out on the Levels of Use pro- 
vides a brief description of 
each of the various levels.) 

Transparency #4 

Explains that interview questions 
such as the ones shown on this 
transparency can be used to de- 
termine where a staff is in rela- 
tion to its use of a new approach. 

Transparency #5 

This transparency is meant to ac- 
company the handout on pro- 
gram components, so that the 
presenter can explain the struc- 
ture of this type of data sum- 
mary. A unique matrix for each 
comprehensive program is devel- 
oped by working closely with the 
program staff to identify the key 
components to be implemented. 
Following the development of 
the list of essential program 
components, data on how com- 
pletely each component is being 
implemented by each teacher in 
the school are gathered via inter- 
views and observations. When all 
data are collected, the pattern 
of implementation for ^e whole 
school is displayed in the matrix 
as illustrated in the handout. 
(This handout displays findings 
for 10 teachers in the building.) 
When showing this transparency. 



the presenter needs to explain 
that this transparency only 
shows the findings for the first 
component. In this row, each one 
of the asterisks represents the 
current level of implementation 
of one teacher in the building. 
This particular pattern shows 
that one of the 10 teachers has 
not yet rearranged the classroom 
(the first essential component of 
the program), and one of the 
teachers has progressed to the 
point of refining the process of 
classroom rearrangement to max- 
imize effectiveness. The imple- 
mentation level of the remaining 
eight teachers is somewhere in 
between. 

Transparency #6 

Provides some sample evaluation 
questions — ones that might be 
developed early in the process 
of comprehensive reform. 

Transparency #7 

This transparency outlines a 
number of factors that have 
been shown to affect program 
implementation. 

Transparendes #8 and #9 

These two transparencies list a 
number of sources of data for 
the implementation evaluation. 

Transparency #10 

Provides additional explanation 
of the types of questionnaires 
or interview data that could be 
collected. 

Transparency #11 

Illustrates a data collection ma- 
trix displaying evaluation ques- 
tions, data sources and timelines. 



and approaches for collecting the 
needed information. 

Instructions for 
Implementation 
Evaluation Handouts 

Handout #1: 

Levels of Use About 
Instructional Implementation 

This handout provides a concise 
list of the Levels of Use, which 
characterizes the implementa- 
tion and iruiovation. Staff start 
at level 0, where they have no 
knowledge of the changes they 
are being asked to make in a 
comprehensive reform model, 
and then proceed through the 
orientation and preparation lev- 
els. When staff first begin to use 
a new instructional approach in 
the classroom, they are entering 
the mechanical-use stage where 
they need both feedback and 
support. If these are not pro- 
vided, staff may continue to use 
the new instructional approach 
but will slip into routine use. 
When using the innovation in a 
routine way, staff are less likely 
to get the full benefit of the new 
approach. Ideally, staff need to 
be helped to move to the refine- 
ment level, where they make 
adjustments that provide the 
greatest benefit to the students. 

Handout #2: 

Program Components 

This handout illustrates how the 
Levels of Use can be used in pro- 
gram evaluation. The evaluator 
needs to identify the key com- 
ponents of the comprehensive 
reform model that are to be im- 
plemented at this site and list 



these on a form like this. Then, 
by interviewing the teachers at 
the school (using questions like 
the ones displayed on Implemen- 
tation Evaluation Transparency 
#4), the evaluator can assess how 
far along the various staff mem- 
bers are in putting these new 
practices into place. The infor- 
mation gathered can be dis- 
played on a grid like the one in 
this hcindout, without violating 
confidentiality. For example, this 
handout demonstrates that all 
10 teachers in this school are 
mechanically preparing their 
units collaboratively. However, 
when it comes to another pro- 
gram component (instruction 
is resequenced to match assess- 
ment expectations), two of the 
10 staff (20 percent) are at the 
refinement phase (making ad- 
justment in the classroom) and 
the remaining 80 percent of the 
teachers are struggling at the 
mechanical use level, with 60 
percent just beginning mechan- 
ical use and another 20 percent 
reaching more advanced levels 
in their application of this ap- 
proach. The purpose of a chart 
like this is to demonstrate pro- 
gress toward implementation 
and to illustrate specific areas 
where additional support or 
staff development are needed. 
For excimple, the data on this 
handout demonstrate that 
teachers probably do not need 
added training on rearranging 
the classroom. 

Handout # 3: 
Documentation of 
Implementation Interference 

This handout illustrates how an 
evaluator can record various 
events in the school that have 
an impact on implementation. 



The first column of this matrix 
lists a number of general areas 
where issues can arise that in- 
terfere with comprehensive re- 
form. Evaluators are likely to 
learn of these issues during in- 
terviews with staff or visits to 
the school. In the second and 
third columns, the evaluator 
would list the specific problem 
that was noted in the general 
area and the source of that in- 
formation, along with the date 
that the concern was noted. 

This matrix can be shared with 
the program staff periodically 
as a way to determine if the in- 
terfering factors are being ad- 
dressed. Program staff can be 
asked to indicate if they are 
aware of these issues and, if so, 
how the concerns are being ad- 
dressed. New data should be 
gathered in the same way as 
the old data to determine if 
barriers are coming down. 

All this information can be 
recorded succinctly on a chart 
like the one in the handout. 

Small-Group Activities 

E ach small-group activity 
is designed to reinforce or 
stimulate the discussion on 
a particular topic or concept. 
They may be conducted before 
or after the discussion. If the 
activity is done before the dis- 
cussion, the topic should be 
briefly introduced first. As a 
presenter, you should guide the 
participants through the activ- 
ity and then lead an interactive 
discussion of the results of the 
groups' work, drawing from the 
contents of the guidebook as 
appropriate to reinforce and/ 
or enrich the discussion. 



The small-group activity can 
also be scheduled to follow a 
more detailed discussion of the 
topic. In this case, the activity 
provides a way for the partici- 
pants to apply what they have 
learned in the presentation and 
discussion. 

Divide the audience into 
groups of about five people. The 
group can consist of members 
of a school team or just partici- 
pants selected by various means 
to form a group. 

As the evaluator introduces 
evaluation concepts to the staff, 
he or she can also begin to col- 
lect data about staff attitudes 
and beliefs. Four ideas for 
doing this are in this section. 
These activities can be used to 
spark staff conversations about 
the reform model and to help 
the evaluation plcinners under- 
stand the context for program 
implementation. The questions 
can be adapted to each site and 
used to collect formative evalu- 
ation evidence from staff. Once 
adapted, such questions Ccin be 
used during interviews or dur- 
ing staff meetings. Following 
the activity, refer the partici- 
pants to parts of the guidebook 
that discuss evaluation models 
and data collection (for instance, 
the Data Collection Matrix on 
Page 67). 

Small-Group Activity #1 

Staff input is also helpful in 
identifying site-specific issues 
related to the comprehensive 
nature of a program. To gather 
data about a program, the eval- 
uator can encourage staff to 
discuss the benefits and limita- 
tions of the new program from 
their own perspectives. Steiff 



4& 



Implementation Evaluation 



meeting time can be used to 
get people to talk about the 
model they are adopting: 

■ What is the strongest fea- 
ture of the model that you 
have chosen? What makes it 
strong? How will you know 
that it is having the desired 
impact? 

■ What is the weakest feature? 
How can you strengthen its 
impact? 

Small-Group Activity #2 

After staff have participated 
in professional development 
in which key components of 
the new model are revealed to 
the staff, the evaluator can con- 
duct staff interviews to answer 
such questions as: 

■ How does this work connect 
with other work underway in 
the school? How much do pro- 
grams overlap? How much will 
this overlap affect implemen- 
tation? Is staff trying to im- 
plement several programs 
simultaneously? 

■ How much innovation and 
change does this reform de- 
mand of staff? 

■ How much does the project 
depend on help and support 
from outside the school? 

■ From trainers? 

■ From community members? 

■ From students? (Atten- 
dance or willingness to put 

in extra effort?) 

■ From outside funding? 

(This is related to project 

sustainability.) 

Small-Group Activity #3 

When conducting an evaluation 
for a comprehensive program, 
the evaluator needs to deter- 
mine if some aspects of the sys- 



tem limit progress. To identify 
what might slow down program 
implementation, the evaluators 
can start zeroing in on this in- 
formation early in the process. 
To help secure information 
about systemic issues, the eval- 
uator can ask staff about sys- 
temic barriers that prevent 
program implementation. 

To do so, the evaluator might 
ask the staff to fill in the sur- 
vey below: 

What parts of the system 
(school, district, state, or com- 
munity) might limit the school 
from using this new approach? 
List those limitations below, 
then rate the seriousness of 
these limitations on a 1 to 5 
scale: 



2 = least serious 
5 = most serious 

■ School barriers 

1 2 3 4 5 

■ District barriers 

1 2 3 4 5 

■ State barriers 

1 2 3 4 5 

■ Community barriers 

1 2 3 4 5 

Small-Group Activity #4 

This activity is designed to be 
used as the model is being im- 
plemented. The evaluator can 
ask each individual in the group 
to complete his or her own per- 
sonal rating on these items and 
then to work in small groups to 
reach consensus. 

To introduce this activity, 
the evaluator can tell the school 
staff that reform models work 
best in situations that have 
open lines of communication. 



This enables consistent imple- 
mentation of the key elements 
of any reform model. Because it 
is difficult for any model to get 
all staff to "buy in" to the proj- 
ect, it is helpful to get st^f 
perspectives as the model is 
being implemented. This activity 
asks staff to help improve the 
work of the school by critiquing 
and rating the work in progress. 

Ask all participants to rate (on 
a 1-5 scale) how well they be- 
lieve the school is doing in cer- 
tain areas such as the following: 

2 = doing poorly 
5 = doing well 

■ Being clear about what 
the end result of the pro- 
gram will be for students 

1 2 3 4 5 

■ Promoting teamwork and 
opportunities for staff to 
learn from one another 

1 2 3 4 5 

■ Having a shared vision 
about how the new pro- 
gram will operate 

1 2 3 4 5 

■ Knowing the role of each 
staff member in the project 

1 2 3 4 5 

■ Having all staff use the 
same instructional practices 

1 2 3 4 5 

Once everyone has done the 
ratings individually, take 10 
minutes of the staff meeting 
time to form small groups, ask- 
ing staff members to compare 
their ratings and to discuss 
how they will know that they 
have achieved these various 
expectations for the project. 



47 



Possible Misconceptions About Program Evaiuation 





05 



GO 



Northwest Regional Educational Laboratory ^ ^ \ “ Implementetlon Evaluation Transparency # I 




Data Collection Considerations in Formative/Implementation Evaluation 






tH 

to 



O 

LO 



— Northwest Regional Educational Laboratory — Implementation Evaluation Transparency #2 





Levels of Use Related to Instructional Implementation 




Northwest Regional Educational Laboratory — Implementation Evaluation Transparency #3 




Interview Questions To Assess Implementation 









(U 

> 

4^ 

(0 

Oi 

CU 




Oi 

C 



(n 

o 

o. 



D 

o .-e 




^ CT 
O C 


■gc. 

CU 


“O w 
W D 
CT 


(0 (/) 
4-. ^ 

<u r- 
V) .£ 

•3 


C D 

o5 5 
^ ro 


4^ (U 


<u 3 


«3 > 


. > 


JC (0 


( 0 : 




X 











f 



cn 




c 




^ CU 


-C 

u 

fD 


ZJ C 


o 


O a; 
CU 


1 - 

Q- 

Q- 

fD 


< 





to 

AO 



AO 





m- Northwest Regional Educational Laboratory Implementation Evaluation Transparency #4 






Program Implementation for Teachers in a Building 



o 

ERIC 




REFINEMENT 



ROUTINE USE 




MECHANICAL — ► 


ro 


>0 Mr 
O' * 
O M 


USE 




^ M 








PREPARATION— ► 


fN 


O * 






tH 



NON USE 




“D 

0) 

O) 

c 

ru 



c 

o 



(D 
(D 4-i 
I- C 

c I 

oj E 

(U a; 

Q- 
W C 
fU E 



C -M 

o 

il 

n o 

U 4-> 



i> 

LO 



CD 

lO 



O — Northwest Regional Educational Laboratory — — Implementation Evaluation Transparency #5 






Sample Formative Evaluation Questions 



(D 

SZ 

>> 

T 3 

(D 

or 

(D 

to 

(D 

biD 

C 

rc 

sz 

u 

(D 

> 

*io 

C 

(D 

x: 

(D 

O- 

E 

•+~> C 
3 CD 

° E 

^ E 

xi .E 

05 



CD 



biD 



biD.E 
T 3 CD 
^ Xi 

o <D 

c: *5 

^ o 

it= E 

S E 

to t: 

to O 
— 

H CD 



n.. 

biD 

c 

c 



05 



T 3 

CD 

T 3 

CD 

CD 



CD 

x: 
■<— ' 

o 




E 

E 

o 

u 

05 

CD 

cc 

10 

c 

O 



E 

CD 

T 3 




i/i 

to 

CD 

O 



Q 



n.. 

T 3 

CD 

C 

biD 

’to 

CD 

T 3 

to 

05 



to 

05 

T 3 

CD 
■<— ' 

C 

CD 

E 

CL 

E 

biD 

c 

’CD 

Xi 




E 

CD 

x: 

O 




3 

05 

CD 

CD 



05 

x: 




CD 

C 



CD 


n.. 


x: 


to 




CD 


biD 


U 


c 


£ 


i/i 

3 


OJ 

to 


it 


C 


05 


biD 


to 


C 


to 


*k. 






■ 


T 3 




03 

m 



00 

lO 




•) 



fEw — Northwest Regional Educational Laboratory Implementation Evaluation Transparency #6 





What Helps Implementation 



? 




O 

CO 



o 

ERIC 



— Northwest Regional Educational Laboratory Implementation Evaluation Transparency #7 










tn 

CD 

sz 

u 

fO 

0) 

•4-I 

E 

o 

o 

cn 

cn 

fC 



E 

o 



a; 

N 

N 



tn 

c 

a; 



c 

W) 

'tn 

tn 

CO 



to 

■D 

0 

u 

1 ^ 

12 QJ 

£ ^ 
U fC 

S ’2 

a; c= 

4-1 a; 

p ^ 

I ^'' 

^ c 

^ fC 

.t: c 

r o 

fO to 
1m a; 

O — 

1/1 ^ 

C O 

QJ o 

E 

3 -n 

O [0 

Q 



a; 

E 

0 

X 

1 

1/1 

c 

a; 

•D 

3 

to 

E 

o 

H- 

T 3 

4S 

H 

m 

k. 

o 

1/1 (J 

-*— • -I— » 

QJ 



QJO 



C 
OJ 

E 

3 

2 2 

Q o. 



to 

4— » 

U 

QJ 







= C 

O QJ 

P 

L W) 



OJ c. 
1/1 ns 

S ^ 

O ns 



W) 

c 

o 

£ 

ns 

to 

ns 

QJ 

!2 

o 

Qj" 

•a 




to 

to 

QJ 

to 

to 

ns 

0 ^ 

4-' .TI 

QJO ^ 

1 I 

£ ° 

S5 CO 

M U 

■§ 

■| S 

O H-« 

*: c 

1/1 QJ 
OJ *o 
3 3 

a 



CO 




CM 

CO 



— Northwest Regional Educational Laboratory — ^ ^ ^ ^ Implementation Evaluation Transparency #8 





Examples of Data Collection Techniques 



(U 



(O 

a 

< 



>> 

Xi 

TD 

(U 

to 

3 

C 

'a; 



(U 

E 

o 



to 

>+-• 

C 

(U 



to 

to 

(U 

to 

to 

05 

>+-• 

C 

(U 

TD 

3 

>+-• 

to 



to 

(U 

’5- 

o 

u 



tj 

Si oj 

o *5 
U ro 

■ CD 



to 

bA 

£ 

’X5 

(D 

(D 

E 

it= 

05 

-t-* 

to 



to 

(D 

-t-* 

D 

C 



(D 

■> 

(D 

0^ 



to 

C 

CL 

c 

o 

to 

to 

to 

O 

O 

X5 

(D 

•n 

05 

b£) 



(D 

> 

(D 



(O 

■o 

L. 

O 

u 

CC 

c 

o 

4i< 

(0 



a; 

(O 

O 



CD 

> 

05 

X- 

O 

X5 

0 

u 

X- 

CD 

U 

05 

(D 

1 

to 

to 

O 

X- 

u 

biD 

c 



T3 

(D 

T3 

05 



to 

C 

o 

'in 

’u 

CD 

TD 



O 

X5 

05 

to 

CD 

O 

c 



to 

biD 

c 



CD 

(TO QJ 
h- (D 

■ E 



c 

o 

’■Jb 

05 

N 

C 

05 

biD 

o 

E 

o 

o 

to 

to 

a 

U 

(D 

S 

u 

to 

(D 

Q 



o 

ERIC 



r 

m 

CD 



CD 



Northwest Regional Educational Laboratory — — — Implementation Evaluation Transparency #9 




Examples of Data Collection Techniques Related to Implementation 



01 

£ 

0^ 

c 

0^ 

*(5 

c 

c 

o 

•vai 

(/) 

0^ 

3 

a 



- 4 -' 

> 

’■3 

U 

(C 

- 4 -' 

c 

CD 

E 

CL 

0 

CD 

> 

CD 

•o 

ij= 

(C 

-4-' 

to 

(C 

bJD 

c 

1 

o 

o 

<+- 

CD 

k- 

’fC 

c 

c 

o 

’■3 

to 

CD 

3 

cr 

(C 

-4-' 

3 

O 

•o 

c 

CD 

LT) 



•a 

CD 

CL 

E 

o 

u 

CD 

> 

cc 

SI 

(D 

SI 

-4-» 

-4-J 

CO 

XI 

-4-» 

to 

■4— » 

c 

3 

5J= 

O 

u 

(D 

XI 

u 



to 

(D 

XI 

u 

to 

(D 

-4-» 

to 

< 



(D 

XI 



•o 

CD 

to 

3 

OD 

c 

■<D 

XI 

(D 

to 

to 

2 

(D 

"to 

E 

(D 

XI 



(D 

C 

E 

CD 

(D 

JJ 



to 

C 

(D 

JJ 

3 

-4-« 

to 

I E 
2 

I- 3 

■ u 



CD 

XI 

-4-' 

CD 

-4-' 

re 

CD 



CD 

XI 



o 

XI 



to 

CD 



to 

X 

CD 



U 

CD 

CL 

to 



to 

CD 

XI 

si 
s s 

to -c; 

m ^ 

■ u 



CD 

•o 

o 

E 

CD 

XI 



3 

CL 



CD 

CD 



to 

XI 

-4-' 

>% 

CD 

XI 

-4-' 

OiD 

C 

XI 

-4-' 

CD 

c 

o 

CD 

X 

u 

to 

CD 

•o 



to 

CD 

sz 

u 

re 

Si CD 

^ re 

< ^ 

■ c 



o 

ERIC 




CO 



CO 

CO 



m- Northwest Regional Educational Laboratory — ' ^ Implementation Evaluation Transparency # 1 0 





Data Evaluation Matrix 





O 

ERIC 



CX) 

CD 



OO 

CD 






Northwest Regional Educational Laboratory — Iraplementation Evaluation Transparency #1 1 




Handout: Levels of Use About Instructional Implementation 



0. Non Use 

Teacher has little or no knowledge of the new approach, no involvement with it, and is doing 
noting toward becoming involved. 

1. Orientation 

Teacher is acquiring information about the new approach and/or has explored its value and its 
orientation, what it will require. 

II. Preparation 

Teacher is preparing for first use of the innovation. 

III. Mechanical use 

Teacher starts to use the new approach, but focuses his or her effort on the short-term, day- 
to-day use of the innovation with little time for reflection; use is disjointed and superficial. 



Teacher use is stabilized. Few if any changes are being made in ongoing use. Teacher no longer 
needs to prepare or give additional thought to use this approach. Time is not spent improving 
the approach or identifying its consequences. 



Teacher varies the approach to increase the impact. Teacher examines both short- and long-term 
consequences to learn more about what works best. Use of this approach is based on input from 
(and in coordination with) colleagues. It is at this point that the primary focus becomes bene- 
fiting students. 

V. Integration 

Teacher uses approach with related activities to achieve a collective impact on students. Teachers 
explore major modifications of the approach to ensure maximum benefit. 

VI. Renewal 

User moves toward a new approach. 

Trom Taking Charge of Change, Shirley M. Hord, William Rutherford, Leslie Huling-Austin, and Gene E. 
Hall, Program Components 



IV. Routine (a) 



IV. Refinement (b) 



70 




Implementation Evaluation Handout # I 






Handout: Program Components 





PREPARING ► 

TO USE 


MECHANICAL- 
ALL USE 


REFINING 
DAY-TO-DAY — ^ 
USE 


1 PROGRAM COMPONENTS 


0 1 


2 3 


4 5 


1. Classroom arrangements have been made 
to facilitate implementation 






90% 10% 


2. Classroom environment assessed to determine 
who will facilitate implementation 




20% 


20% 60% 


3. Teacher knowledge of students' interest guides 
program design 




20% 60% 

:4C5*C 5*C5*C5*C 

5*C5*C5*C 


10% 10% 

* * 


4. Teachers prepare units in collaboration with 
others at their grade level 




100% 

5*C 5*C 5*C 5*C 
5*C 5*C 5*C 5*C 5tC 




5. Basic skills integrated into instruction 




60% 

5*C5*C5*C 

5*C5*C3*C 


40% 

3*C*:*C:*C 


6. Picture books are used as recommended 


10% 

* 


30% 40% 


20% 

** 


7. Students assess their own learning 


10% 

* 


10% 40% 


30% 10% 

*** * 


8. Instruction is resequenced to match with 
assessment expectations 




20% 60% 

5*C5*C5*C 

3*C3*C3*C 


20% 

** 



EKIC 



71 



iaSCT — Northwest Regional Educational Laboratory - 



Implementation Evaluation Handout #2 



o 



Permission Is granted for reproduction by schools for classroom use. Written permission is required for any other use. 







Handout: Documentation of Implementation Interference 



Issue 


Spedfic Information 
Data Source 


Date 

Noted 


How was concern 
addressed? 


Improvement 

noted? 


Finances 


■ Coordinators' time cut back 
because of limited funds 

Examining budget records 








Leadership 


■ Principal does not act like 
he or she values the program; 
does not attend staff devel- 
opment, says little to staff 

Meeting observation 








Commitment 


m No pressure for commitment; 
teachers can choose to imple- 
ment program at whatever 
level they wish 

Teacher interviews 








Political Issues 


■ Administrators maike deci- 
sions based upon political 
pressure 

Interviews 








Group Conflicts 


■ Staff diversity causes 
internal conflicts 








Fadlities 


■ Building cannot be up- 
graded to allow technology 
needed for program imple- 
mentation 








Management/ 

Communication/ 

Scheduling 


■ Communication within the 
site is dysfunctional 

■ Staff reschedule students 
throughout the year 









o 

ERIC 

— Northwest Regional Educational Laboratory 



72 



Implementation Evaluation Handout #3 



Permission is granted for reproduction by schools for classroom use. Witten permission is required for any other use. 





Impact Evaluation 



T his section of the guidebook 
addresses the question of 
whether the intervention (in 
this case, the implementation 
of a particular school reform 
model or approach) has made 
a difference at the school. For 
example, has it changed any 
school policy and practice? 
Strengthened instructional 
strategies? Improved student 
achievement? Has it contributed 
to the ultimate goal of providing 
opportunities for all students 
to meet high standards? The 
section presents several com- 
monly used evaluation models, 
discusses advantages and disad- 
vantages of each, and provides 
a step-by-step illustration of 
how each model can be imple- 
mented in a school setting. 



The ultimate outcome we are 
looking for is improved student 
performance in academic subject 
areas, attitudes, and behavior. 



It is common practice to use 
the terms "outcome" and "im- 
pact" interchangeably. In this 
section, we make a distinction 
between the two words. Out- 
comes will be used to refer to 
any results or consequences of 
an intervention — in this case, a 
whole-school reform effort. Im- 
pact is a particular type of out- 
comes. It refers to the ultimate 
results or outcomes. In the case 
of whole-school reform, we are 
really taUdng about results for 
students. For example, a whole- 
school reform effort can and 
usually does improve communi- 
cation among the faculty and 




school administrators. It may 
also increase parental involve- 
ment with school activities. 
These are certainly desired out- 
comes. However, the ultimate 
outcome we are looking for is 
improved student performance in 
academic subject areas, attitudes, 
and behavior. These outcomes 
will therefore be considered as 
impact. For purposes of this 
section, we will use the term 
impact evaluation to include 
both outcomes and impact 
(Yap, 1997). 

Outcomes can occur at many 
levels. We can assess outcomes 
at three intenelated levels: sys- 
tem, teacher, and student. At 
the system level, the interven- 
tion may have changed the way 
the school allocates resources 
and time for instruction. It may 
have affected its policy on pro- 
fessional development. At the 
teacher level, instructional 
strategies may have changed 
as a result of the intervention. 
Assessment practices may have 
been affected. At the student 
level, performance may have 
changed or improved on vari- 
ous measures. 



Impact evaluation should be 
conducted only after a program 
has attained a sufficient level 
of stability. In practice, impact 
evaluation should be preceded 
by an implementation evaluation 
to make sure that the intended 
program elements have been put 
in place before we attempt to 
look at their effects. Assessing 
the impact of a nonentity — a 
program that has yet to be put 
in place — is meaningless and a 
waste of resources that can be 



put to better use (such as en- 
suring a high-fidelity imple- 
mentation of the program). 

Evaluation Models 

T he central question to be 
addressed in an impact eval- 
uation is whether the interven- 
tion, in this case a whole-school 
reform effort, has made a dif- 
ference for the target groups. 
There are of course different 
ways to find out whether the 
effort has made a difference. 
The different ways are some- 
times described as evaluation 
models. The models can differ 
in many ways. An important 
difference is the extent to 
which the results they produce 
allow us to connect the imple- 
mentation of various program 
elements with the outcomes or 
impact — to make a causal link 
between the two. This is some- 
times described as the scientific 
rigor or validity of the model. 
In other words, some models 
are more likely than others to 
produce results that allow us 
to establish a causal link. 



Some models are more likely than 
others to produce results that 
allow us to establish a causal link. 



There can be as many models 
as there are program evaluators. 
However, the most commonly 
used models are: pretest-posttest 
model, comparison group model, 
regression model, and control 
group model. While the models 
are different, each must estab- 
lish a standard or expectation 



73 



Impact Evaluation 







Figure 2. Evaluation model 



against which to examine the 
program results. In other words, 
each must address this impor- 
tant question: What would be 
the expectation if the interven- 
tion was not implemented at 
the school? That is, how would 
students have performed with- 
out the program? 

For example, in the pretest- 
posttest model, the expectation 
is that without the intervention, 
things will continue to go the 
way they have gone before. 
Teachers will continue to teach 
as they did before, and students 
will continue to perform as they 
did before. The baseline before 
the intervention will in fact be 
the expectation. Any difference, 
positive or negative, that occurs 
following the intervention is 
therefore attributable to the 
intervention (Tallmadge, 1982). 

In the control/ comparison 
group models, the standard or 
expectation is that without the 
intervention, things should be 
very much like those that exist 
in a similar or equivalent 
school or group of students. 

The critical issue is, of course, 
to identify and select an equiv- 
alent or similar school or group 
of students to be the control or 
comparison group. 



The regression model uses a 
statistical method to predict or 
project what things would have 
been like without the interven- 
tion. The method takes into ac- 
count most, if not all, relevant 
factors, including such things 
as current status and critical 
contextual variables (for exam- 
ple, demographic and socioeco- 
nomic backgrounds of schools 
and students). 

For each model, once the no- 
intervention standard or expec- 
tation is set up, the actual state 
of affairs (instmctional practice, 
say, or student performance) is 
then compared with the expec- 
tation. With varying degrees of 
confidence, we then attribute 
the difference to the interven- 
tion as illustrated above. 

Each model, however, implic- 
itly makes the "other things 
being equal" assumption. That 
is to say, other than the inter- 
vention — the whole-school re- 
form effort — there is no signif- 
icant difference between the 
project students and students 
used to set up the standard or 
expectation. This assumption, 
of course, is not always true. To 
the extent that this assumption 
does not hold, it is difficult to 
make a connection between pro- 
gram implementation and im- 
pact. In other words, it becomes 
problematic to attribute the out- 
comes or impact to the program. 




74 



Pretest-Posttest Model 

T his model makes the assump- 
tion that without the inter- 
vention, things will go on as 
they did before. Other things 
being equal, teachers will con- 
tinue to teach as they did be- 
fore, and students will continue 
to show the same pattern of 
achievement as they did before. 
With the intervention, things 
will change over time, it is 
hoped in a positive way. 

This model assumes that the 
intervention occurs between 
pretest and posttest. Any dif- 
ference that is detected between 
the two points in time will be at- 
tributed to the intervention. The 
model can include repeated mea- 
sures. For example, both teaching 
practice and student achievement 
can be measured repeatedly at 
predetermined intervals (for ex- 
ample, twice a year or arumally). 
The pattern of change at differ- 
ent points in time can then be 
interpreted as a result of the in- 
tervention. If the pattern of stu- 
dent achievement shows an 
upward trend over time (say, 
several years) then one can in- 
terpret the trend as evidence of 
sustained effects of the interven- 
tion (Blum, Yap, & Butler, 1991; 
Kushman & Yap, 1999). 

Ideally, pretest and posttest 
measures should be taken from 
intact cohorts of students (the 
same students at two or more 
points in time). This is espe- 
cially important when the in- 





100 



90 

80 




1997 



1998 1999 

School Year 



2000 



Figure 3. Pretest- Posttest model 



tent is to measure gains of in- 
dividual students. However, in 
a school setting, pretest and 
posttest measures, or repeated 
measures over a longer period 
of time, are typically taken 
from non-intact cohort groups. 
For example, assessments may 
be conducted with third- 
graders at a school on an an- 
nual basis. In this case, the 
measurements are obviously 
not taken from the same stu- 
dents. While the unknown bias 
that may result is a concern, it 
is less critical when we are pri- 
marily interested in knowing 
how a school, as a unit of 
change, is being affected by 
the intervention over time. 

To get a longitudinal per- 
spective, the pretest-posttest 
model can be implemented as 
a quasi-time-series model 
where repeated measures are 
taken over several years. For 
example, assessments can be 
conducted on an annual basis 
to identify longitudinal pat- 
terns and trends in student 
outcomes, as shown below. The 
line graph shows increasing per- 
centages of students meeting 



O 

ERIC 



state standards from 1997 (base- 
line year) through 2000. 

Typically, program outcomes 
and impact are measured longi- 
tudinally over several years. A 
consistently positive or upward 
trend can provide compelling 
evidence that the intervention 
is producing positive results. 

It is, however, difficult to rule 
out completely the possibility 
that the positive trend is the 
result of some other factors (such 



as change in student population 
or change in teaching staff). 

Implementation Steps. The 

pretest-posttest model is rela- 
tively easy to implement. Impor- 
tant steps include the following: 

1. Decide what outcomes you 
want to look at 

2. Select or develop instruments 
to collect the pertinent data 



Advantages. The greatest advantage of the pretest-posttest model is that 
it is highly feasible in a school setting. It does not require a control or 
comparison group or a high level of statistical expertise to implement the 
model. It is one of the least intrusive models and it does not impose a heavy 
data burden on teachers and students. It can assess progress against a base- 
line. Further, it can measure growth or an absolute level of performance 
(Messick, 1985). For example, we can measure growth (an increase of 10 
percent) toward meeting state standards. Alternatively, we can assess the 
extent to which an absolute level of performance (e.g., 60 percent of 
students meeting state standards for a particular school year) is attained. 

Disadvantages. The greatest disadvantage of this model is that it lacks 
scientific rigor unless it is implemented as a true time-series model, using 
intact cohorts. In a true time-series model, the intervention is introduced 
and withdrawn at will or at random at various points in time. The assumption 
is that when the intervention is withdrawn at any point in time, things will 
revert to the preintervention status. In a school setting, however, it is 
seldom, if ever, possible to introduce and withdraw an intervention at 
will over time. 



Impact Evaluation 




3. Decide whether sampling is 
desired 

4. Administer the instruments 
to target groups at pretest time 
(for example, the beginning of 
school year) 

5. Administer the instruments 
at posttest time (for example, 
the end of school year) 

6. Analyze and interpret the 
evaluation data 

7. Report findings to stake- 
holder groups 

8. Use evaluation data for ac- 
countability and program im- 
provement 

The following example illus- 
trates the use of the pretest- 
posttest model to assess the 
impact of a school reform model. 



The statewide assessment pro- 
gram conducts testing of stu- 
dents in grades three and five 
in two core subject areas — read- 
ing and mathematics. The assess- 
ment takes place in April each 
year. The school also participates 
in districtwide writing assess- 
ment with grade five students 
in April each year. 

The school leadership team 
wants to know if student per- 
formance is improving with the 
implementation of the school 
reform model. The team chooses 
to use the pretest-posttest model 
to conduct an impact evaluation 
of the school reform model. To 
take advantage of existing data 
available from the statewide 
assessment program, the team 
decides to use an annual test- 
ing cycle — April to April — 
rather than fall-to-spring to 
assess impact. 



trict office. The only data col- 
lection instrument that needs 
to be created is a data form to 
provide summary data on stu- 
dent attendance — number of 
days absent per school year. 

Step #3 

Student achievement data are 
obtained — electronically when 
feasible — from the statewide 
assessment program for grades 
three and five. There are ap- 
proximately 60 students in 
each of these grades. Data are 
obtained for all the students. 

No sampling is needed or de- 
sired. In addition, writing as- 
sessment data are obtained 
for all students in grade five. 
Attendance data are collected 
from school attendance records 
for all students in grades three 
and five. No sampling proce- 
dures are used. 



Pretest-Posttest 
Model — An Example 

T he Jefferson Elementary 
School has an enrollment 
of 500 students in kindergarten 
through grade five. The school 
has a very diverse student pop- 
ulation with 35 percent minor- 
ity students. Approximately 60 
percent of the students are in 
the free or reduced-price lunch 
program. Jefferson has just 
adopted a comprehensive school 
reform model — Reading Enhance- 
ment — for schoolwide implemen- 
tation. A school leadership team 
is formed to oversee the school 
improvement effort. 




Step #1 

The school leadership team, 
following extensive discussions 
with school staff, parents, and 
members from the community, 
decides to look at student per- 
formance in four areas: reading, 
mathematics, writing, and at- 
tendance. Even though the 
school reform model is focused 
on reading, the school and the 
community feel that it is impor- 
tant to look at other success in- 
dicators for the entire school. 

Step #2 

Most of the pertinent data 
will come from the statewide 
assessment program, including 
student achievement in reading 
and mathematics. Writing as- 
sessment data (for grade five 
only) will come from the dis- 



Step #4 

In the preceding school year, 
after receiving training in test 
administration from state-level 
staff as part of the statewide as- 
sessment process, the classroom 
teachers administer the criterion- 
referenced tests in reading and 
mathematics to students in 
grades three and five in April. 
The tests, which have been 
aligned with the state content 
standards, consist of multiple- 
choice items and a few open- 
ended items. The tests are 
scored by a vendor and the 
results provided to the school 
and district as well as the state 
department of education. 



76 



In addition, the writing as- 
sessment is conducted with 
students in grade five following 
procedures established by the 
district office. 

Step #5 

Also as part of the statewide as- 
sessment process, the criterion- 
referenced tests in reading and 
mathematics are administered 
to students in grades three and 
five in April in the current 
school year. In addition, the 
writing assessment is conducted 
with students in grade five fol- 
lowing procedures established 
by the district office. 

Step #6 . 

A database is set up to store and 
manage all the data, including 
attendance data collected at the 
end of the school year. The data- 
base contains statewide assess- 
ment data (reading and math- 
ematics) as well as districtwide 
writing assessment data for the 
current and preceding school 
years. The data are analyzed 
to provide percentages of stu- 
dents (grades three and five) 
who meet the state standards 
or benchmarks for the current 
school year and the preceding 
school year — prior to the im- 
plementation of the school 
reform model. A difference in 
percentage points provides an 
indication of impact. Attendance 
data are analyzed to provide an 
average (mean or median) num- 
ber of days absent for each 
school year. Similar analyses 
will be conducted in future 
years to detect any consistent 
trends and patterns. 



Step #7 

Results of the analysis are 
provided in reader-friendly 
data displays (e.g., bar charts 
and line graphs) and easy-to- 
understand narratives. They 
are shared and discussed with 
stakeholder groups, including 
school staff, site council, par- 
ents, and members of the com- 
munity. 

Step #8 

The results are provided to the 
district office and the state de- 
partment of education to deter- 
mine whether adequate progress 
has been made by the school. 

In addition, a meeting is held 
with the school leadership 
team, other key school staff, 
parents, and community mem- 
bers for an indepth review of 
the data to explore plausible 
reasons for the findings and 
to develop recommendations 
and an action plan for contin- 
uous improvement. 

Comparison Group Model 

T his model provides an ex- 
pectation of program out- 
comes based on a comparable 
group (Kushman & Yap, 1999). 
The comparison group, when 
selected appropriately, provides 
a basis for determining what 
might be expected to occur in 
the absence of the interven- 
tion. The comparison group 
should be similar (if not equiva- 
lent) to the intervention group 
in all relevant respects. Some 
of the pertinent factors include 
current achievement level, so- 
cioeconomic and related demo- 
graphic factors, school locale, 
and size. Other things being 
equal, any detected difference 



between the two groups is at- 
tributable as impact of the in- 
tervention. The bar graph on 
Page 54 shows higher percent- 
ages of project students meet- 
ing state standards relative to 
their comparison counterparts 
in reading and mathematics. 

Implementation Steps. Impor- 
tant steps in implementing the 
comparison group model include 
the following: 

1. Decide what outcomes you 
want to look at 

2. Select or develop instruments 
to collect the pertinent data 

3. Identify and select a com- 
parison group 

4. Decide whether sampling is 
desired 

5. Administer the instruments 
to both project and comparison 
groups 

6. Analyze and interpret the 
evaluation data 

7. Report findings to stake- 
holder groups 

8. Use evaluation data for ac- 
countability and program im- 
provement 

The following example illus- 
trates the use of the compari- 
son group model to assess the 
impact of a school reform 
model. 



77 



Impact Evaluation 




Subject Area 



Figure 4. Comparison group model 



Comparison Group 
Model — An Example 

he Jefferson Elementary 
School has an enrollment 
of 500 students in kindergarten 
through grade five. The school 
has a very diverse student popu- 
lation with 35 percent minority 
students. Approximately 60 per- 
cent of the students are in the 
free or reduced-price lunch pro- 
gram. Jefferson has just adopted 
a comprehensive school reform 
model — Reading Enhancement — 
for schoolwide implementation. 
A school leadership team is 
formed to oversee the school 
improvement effort. 

The statewide assessment 
program conducts testing of 
students in grades three and 
five in two core subject areas— 
reading and mathematics. The 
assessment takes place in April 
each year. The school also par- 
ticipates in districtwide writing 
assessment with grade five stu- 
dents in April each year. 



The school leadership team 
wants to know whether with 
the implementation of the 
school reform model students at 
Jefferson are performing better 
than students in comparable 
schools (e.g., schools with 
similar demographic character- 
istics). The team chooses to use 
the comparison group model to 
conduct an impact evaluation 
of the school reform model. To 
the extent feasible and appro- 
priate, the evaluation will take 
advantage of existing data 
available from the statewide 
assessment program to assess 
impact. 

Step #1 

The school leadership team, 
following extensive discussions 
with school staff, pcirents, and 
members from the community, 
decides to look at student per- 
formance in four areas: reading, 
mathematics, writing, and at- 
tendance. Even though the 
school reform model is focused 
on reading, the school and the 



community feel that it is im- 
portant to look at other suc- 
cess indicators for the entire 
school. 

Step #2 

Most of the pertinent data 
will come from the statewide 
assessment program, including 
student achievement in reading 
and mathematics. Writing as- 
sessment data (for grade five 
only) will come from the dis- 
trict office. The only data col- 
lection instrument that needs 
to be created is a data form to 
provide summary data on stu- 
dent attendance — number of 
days absent per school year. 

Step #3 

In consultation with district- 
level staff, the leadership team 
identifies two schools in the 
district that are demo graphi- 
cally similar to Jefferson. In 
School A, about 58 percent of 
the students are in the free or 
reduced-price lunch program. 






In School B, the percentage is 
62. Both schools have a diverse 
student population, with 35 per- 
cent minority students. School 
A has an enrollment of 400 stu- 
dents in kindergarten through 
grade five. School B has an en- 
rollment of 600 students in the 
same grade span. Neither School 
A nor School B is implementing 
a comprehensive school reform 
program. The Jefferson leader- 
ship team decides that both 
School A and School B will be 
used as comparison schools in 
the evaluation. 

Step #4 

Student achievement data are 
obtained — electronically when 
feasible — from the statewide 
assessment program for grades 
three and five. There are ap- 
proximately 70 or fewer stu- 
dents in each of these grades 
at Jefferson and the comparison 



Step #5 

As in past yeaurs, after receiving 
training in test administration 
from state -level staff as part of 
the statewide assessment pro- 
cess, the cleissroom teachers ad- 
minister the criterion-referenced 
tests in reading and mathemat- 
ics to students in grades three 
and five in April at both Jeffer- 
son and the comparison schools. 
The tests, which have been 
aligned with the state content 
standards, consist of multiple- 
choice items eind a few open- 
ended items. The tests are 
scored by a vendor and the re- 
sults provided to the school and 
district as well as the state de- 
partment of education. 

In addition, the writing as- 
sessment is conducted with 
students in grade five following 
procedures established by the 
district office. Student atten- 



Advantages. This model has relatively strong scientific rigor, making it 
easier to attribute outcomes to the intervention. It is quite feasible when 
we can find naturally existing comparison groups (that is, student groups 
in a demographically similar school). In addition, it allows us to compare 
progress toward meeting common criteria (such as state standards). 

Disadvantages. It is often difficult to find an appropriate comparison group. 
In addition, the selected groups may differ in important but unknown ways. 

Another disadvantage is that data need to be collected for both 
intervention and comparison students, increasing the data collection 
burden and cost. 



schools. Data are obtained for 
all the students. No sampling 
is needed or desired. In addi- 
tion, writing assessment data 
are obtained for all students in 
grade five. Attendance data are 
collected from school atten- 
dance records for all students 
in grades three and five. No 
sampling procedures are used. 



dance data are obtained from 
school records at Jefferson. For 
the comparison schools, atten- 
dance data are provided by the 
district office. 



79 



step #6 

A database is set up to store and 
manage all the data. The data- 
base contains statewide assess- 
ment data (reading and math- 
ematics), districtwide writing 
assessment data, as well as at- 
tendance data for both Jeffer- 
son and the comparison schools. 
The data are analyzed to pro- 
vide percentages of students 
(grades three and five) who 
meet the state standards or 
benchmarks in reading, math- 
ematics, and writing for both 
Jefferson and the comparison 
schools. For comparison pur- 
poses, results for the two com- 
parison schools are combined to 
provide a single percentage for 
each subject area. A difference 
in percentage points between 
Jefferson and the comparison 
schools provides an indication 
of impact. 

In addition, an analysis is 
conducted on the mean dif- 
ferences of standard scores 
in reading and mathematics 
as well as ratings in writing 
assessment between Jefferson 
and the comparison schools 
(combined). A t test is per- 
formed to determine the sta- 
tistical significance of each 
mean difference. A significant 
difference indicates that the 
intervention has, with a cer- 
tain statistical probability, 
made a real difference in stu- 
dent performance. In addition, 
for each grade and each subject 
area, an effect size is calcu- 
lated to assess the magnitude 
or educational significance of 
the difference. 

Attendance data are analyzed 
to provide an average (mean or 
median) number of days absent 



impact Evaluation 



per school year for Jefferson 
students and their counterparts 
at the compairison schools. 

Step #7 

Results of the analysis are pro- 
vided in reader-friendly data dis- 
plays (e.g., bar charts and line 
graphs) and easy-to-understand 
narratives. They are shared and 
discussed with stakeholder 
groups, including school staff, 
site council, parents, and mem- 
bers of the conununity. 

Step #8 

The results are provided to the 
district office and the state de- 
partment of education to de- 
termine whether adequate 
progress has been made by the 
school. In addition, a meeting 
is held with the school leader- 
ship team, other key school 
staff, parents, and community 
members for an indepth review 
of the data to explore plausible 
reasons for the findings and to 
develop recommendations and 
an action plan for continuous 
improvement. 

Regression Model 

U sing a statistical procedure 
called regression analysis, 
the model predicts or projects 
what things would have been 
like had there been no interven- 
tion (Fetler & Carlson, 1985; Yap, 
Estes, & Hansen, 1979; Yap, Estes, 
& Nickel, 1988; Yap, [September] 
1980). The projection can take 
into account a range of factors 
that may have an influence on 
the outcomes, including demo- 
graphics and current status of 
aftos. Typically, baseline status 
and relevant demographic vari- 
ables are included in the regres- 



sion equation. Other things being 
equal, the difference between 
actual outcomes and predicted 
outcomes is attributable as im- 
pact of the intervention. In the 
example shown on the following 
page, the project students as a 
group (or individually) scored 
higher on the state assessment 
than the level predicted by the 
regression equation. 

The regression model is in 
many ways analogous to the 
baby growth chart one sees in 
a doctor's office. Based on such 
relevant information as a child's 
age, gender, and what is known 
about normal growth, the chart 
provides an expectation of the 
child's height and weight. Simi- 
larly, based on a student's grade 
level, current achievement status, 
and other relevant variables, the 
regression model provides an 
expectation on the student's 
achievement growth in core 
subject areas. 

Implementation Steps. Impor- 
tant steps in implementing the 
regression model include the 
following: 

1. Decide what outcomes you 
want to look at 

2. Select or develop instruments 
to collect the pertinent data 

3. Identify and obtain data 
needed to develop a regression 
equation 

4. Develop a regression equation 
to predict outcomes 

5. Decide whether sampling is 
desired 

6. Administer the instruments 
to target groups 



7. Analyze and interpret the 
evaluation data 

8. Report findings to stake- 
holder groups 

9. Use evaluation data for ac- 
countability and program im- 
provement 

The following example illus- 
trates the use of the regression 
model to assess the impact of a 
school reform model. 

Regression Model— 

An Example 

T he Jefferson Elementary 
School has an enrollment 
of 500 students in kindergarten 
through grade five. The school 
hcis a very diverse student popu- 
lation with 35 percent minority 
students. Approximately 60 per- 
cent of the students are in the 
free or reduced-price lunch pro- 
gram. Jefferson has just adopted 
a comprehensive school reform 
model — Reading Enhancement — 
for schoolwide implementation. 
A school leadership team is 
formed to oversee the school 
improvement effort. 

The statewide assessment 
program conducts testing of 
students in grades three and 
five in two core subject areas — 
reading and mathematics. The 
assessment takes place in April 
each year. The school also par- 
ticipates in districtwide writing 
assessment with grade five stu- 
dents in April each year. 

The school leadership team 
wants to know whether student 
performance is improving with 
the implementation of the 
school reform model. Given the 



280 



260 



240 



220 



OJ 

p 200 



(/) 



c 180 



160 



140 



120 



100 




Expected 



Actual 



Regression 



Figure 5. Regression model 



intervention, are students per- 
forming as well as expected? 

The team chooses to use the 
regression model to conduct an 
impact evaluation of the school 
reform model. To the extent fea- 
sible and appropriate, the evalu- 
ation will tike advantage of 
existing data available from the 
statewide assessment program. 

Step #1 

The school leadership team, 
following extensive discussions 
with school staff, parents, and 
members from the community, 
decides to look at student per- 
formance in two core subject 
areas: reading and mathematics. 
Even though the school reform 
model is focused on reading, the 
school and the community feel 
that it is important to look at 
student performance in mathe- 
matics as well. 




Step #2 

Most of the pertinent data 
will come from the statewide 
assessment program, including 
student achievement in reading 
and mathematics. Relevant 
school-level demographic data, 
including percent of students 
in free or reduced-price lunch 
program and percent of minor- 
ity students, will also be ob- 
tained from the statewide 
assessment data system. No 
new data collection instru- 
ments are needed. 



Step #3 

Working with an external eval- 
uator, the school leadership team 
decides that three types of data 
will be included in the regression 
equation: student achievement 
in reading and mathematics 
(for preceding school year and 
current school year), percent 
of students in free or reduced- 
price lunch program, and per- 
cent of minority students. In 
the regression analysis, the 
predictor variables will include 
student achievement for the 
preceding school year, percent 
of students in free or reduce d- 



Advantages. The models can have a high level of scientific rigor if the 
projection includes all of the pertinent factors. It takes advantage of 
existing data and does not require data collection from a control or 
comparison group. It statistically controls for extraneous factors 
affecting outcomes, making it possible to attribute program effects. 

Disadvantages. The feasibility of the model depends in large measure 
on the availability of sufficient archival data — data that already exist — 
on the pertinent variables. The model requires statistical skills that may 
not exist among school staff. In addition, because it is essentially a 
statistical procedure, the model can often be misused. 



ai 



Impact Evaluation 




price lunch program, and per- 
cent of minority students. The 
criterion or outcome variable 
is student achievement for the 
current school year. The evalu- 
ator will develop separate re- 
gression equations for reading 
and mathematics, using schools 
as units of analysis. The analy- 
sis will use school average 
scores — for grades three and 
five — instead of individual 
student scores. 

To achieve sufficient reliabil- 
ity, the leadership team feels 
that the regression equation 
should be based on all 120 ele- 
mentary schools in the state 
with grades three and five. All 
necessary data are obtained — 
electronically when feasible — 
from the statewide assessment 
data system. 

Step #4 

Using an appropriate data anal- 
ysis package (e.g., SPSS or Excel), 
the external evaluator develops 
two separate regression equa- 
tions to predict third-grade 
achievement — one for reading 
and one for mathematics. For 
each subject area, the equation 
predicts student achievement 
on the basis of achievement 
status for the preceding year, 
the percent of students in free 
or reduced-price lunch program, 
and the percent of minority 
students. For each of the schools 
included in the regression equa- 
tion, the average scale score for 
third-graders is used as a mea- 
sure of student achievement. 

The evaluator develops simi- 
lar regression equations to pre- 
dict fifth-grade achievement. 



Step #5 

All elementary schools in the 
state with grades three and five 
are included in the regression 
equation. No sampling proce- 
dures are used. 

Step #6 

As in past years, after receiving 
training in test administration 
from state-level staff as part of 
the statewide assessment process, 
the classroom teachers admin- 
ister the criterion-referenced 
tests in reading and mathemat- 
ics to students in grades three 
and five in April. The tests, 
which have been aligned with 
the state content standards, con- 
sist of multiple-choice items and 
a few open-ended items. The 
tests are scored by a vendor and 
the results provided to the school 
and district as well as the state 
department of education. 

The criterion-referenced tests 
provide a standard score in read- 
ing and mathematics for each 
student. 

Step #7 

The regression equations pro- 
vide predicted achievement 
levels (i.e., average standard 
scores) for third- and fifth- 
graders in reading and mathe- 
matics. The predicted average 
stcindard scores are compared 
with the actual average stan- 
dard scores of third- and fifth- 
graders at Jefferson. The differ- 
ence is interpreted as an indi- 
cation of impact of the school 
reform model on student per- 
formance. 




The regression analysis iden- 
tifies a cluster of four schools 
that most closely resemble Jef- 
ferson with respect to demo- 
graphics. The average standard 
scores of these schools (com- 
bined) are compared with the 
average standard scores of Jeffer- 
son for third- and fifth-graders, 
respectively. The difference 
provides another indication of 
impact. A t test is performed to 
determine the statistical signif- 
icance of each mean difference. 
A significant difference indi- 
cates that the intervention 
has, with certain statistical 
probability, made a real differ- 
ence in student performance. 

In addition, for each grade and 
each subject area, an effect size 
is calculated to assess the mag- 
nitude or educational signifi- 
Ccince of the difference. 

In addition, for each grade and 
each subject area, the percent- 
age of students meeting state 
standards and benchmarks at 
Jefferson are compared with the 
percentage of students meeting 
standards and benchmarks at the 
four demographically similar 
schools. The difference provides 
yet another indication of impact. 

Step #8 

Results of the analysis are pro- 
vided in reader-friendly data dis- 
plays (e.g., bar charts and line 
graphs) and easy-to-understand 
narratives. They are shared and 
discussed with stakeholder 
groups, including school staff, 
site council, parents, and mem- 
bers of the community. 



82 



step #9 

The results are provided to the 
district office and the state de- 
partment of education to de- 
termine whether adequate 
progress has been made by the 
school. In addition, a meeting 
is held with the school leader- 
ship team, other key school 
staff, parents, and community 
members for an indepth review 
of the data to explore plausible 
reasons for the findings and to 
develop recommendations and 
an action plan for continuous 
improvement. 

Control Group Model 

T his is a true experimental de- 
sign. Properly implemented, 
it requires random assignment 
of students to the intervention 
and control groups. Random as- 
signment ensures the compara- 
bility or equivalence of the two 
groups in all pertinent respects 
other than the intervention it- 
self (The Joint Committee on 
Standards for Educational Eval- 
uation, 1994). Any difference 
between the two groups with 
respect to outcomes is therefore 
directly attributable to program 
effects. In the example shown 
on the following page, higher 
percentages of project students 
meet state standards in reading 
and mathematics in comparison 
with their control counterparts. 

Implementation Steps. Impor- 
tant steps in implementing the 
control group model include the 
following: 

1. Decide what outcomes you 
want to look at 



2. Select or develop instruments' 
to collect the pertinent data 

3. Set up a control group 
through random assignment 
of students or other entities 
of interest 

4. Decide whether sampling is 
desired 

5. Administer the instruments to 
both project and control groups 

6. Analyze and interpret the 
evaluation data 

7. Report findings to stake- 
holder groups 

8. Use evaluation data for ac- 
countability and program im- 
provement 

The following example illus- 
trates the use of the control 
group model to assess the im- 
pact of a school reform model. 

Control Group Model — 

An Example 

he Jefferson Elementary 
School has an enrollment 
of 500 students in kindergarten 
through grade five. The school 
has a very diverse student popu- 
lation with 35 percent minority 
students. Approximately 60 per- 
cent of the students are in the 
free or reduced-price lunch pro- 
gram. Jefferson has just adopted 
a comprehensive school reform 
model — Reading Enhancement — 
for schoolwide implementation. 
A school leadership team is 
formed to oversee the school 
improvement effort. 




er|c 



83 



The statewide assessment 
program conducts testing of 
students in grades three and 
five in two core subject areas — 
reading and mathematics. The 
assessment takes place in April 
each year. The school also par- 
ticipates in districtwide writing 
assessment with grade five stu- 
dents in April each year. 

The school leadership team 
wants to know whether with 
the implementation of the 
school reform model students 
at Jefferson are performing 
better than they would have 
without the intervention. The 
team wants to use an evalua- 
tion model with a high level 
of scientific rigor — the control 
group model — to assess the im- 
pact of the school reform effort. 
To the extent feasible and ap- 
propriate, the evaluation will 
take advantage of existing data 
available from the statewide 
assessment program. 

Step #1 

The school leadership team, 
following extensive discussions 
with school staff, parents, and 
members from the community, 
decides to look at student per- 
formance in four areas: reading, 
mathematics, writing, and atten- 
dance. Even though the school 
reform model is focused on read- 
ing, the school and the commu- 
nity feel that it is important to 
look at other success indicators 
for the entire school. 



tmpaa Evaluation 



100 



D Project 
■ ^"nntrol 




Reading Math 



Subject Area 



Figure 6. Control group model 



Step #2 

Most of the pertinent data will 
come from the statewide as- 
sessment program, including 
student achievement in reading 
and mathematics. Writing as- 
sessment data (for grade five 
only) will come from the dis- 
trict office. The only data col- 
lection instrument that needs 
to be created is a data form to 
provide summary data on stu- 
dent attendance — number of 
days absent per school yeau. 

Step #3 

Jefferson has three classes of 
third-graders and three classes 
of fifth-graders. Prior to the 
adoption of the school reform 
model, the leadership team 
works closely with the school 
administration and the teach- 
ing staff to reach a decision 
that, for the current school 
year, two of the three classes 
in each grade will participate 
in Reading Enhancement. The 




other class will serve as the con- 
trol group. Furthermore, the 
school administration is able 
to persuade parents to allow 
students to be randomly as- 
signed to the classes. 

Step #4 

Student achievement data are 
obtained — electronically when 
feasible — from the statewide 
assessment program for grades 
three and five. There are approx- 
imately 70 or fewer students in 
each of these grades at Jeffer- 
son. Data are obtained for all 
the students. No sampling is 



needed or desired. In addition, 
writing assessment data are ob- 
tained for all students in grade 
five. Attendance data are col- 
lected from school attendance 
records for all students in grades 
three and five. No sampling 
procedures are used. 

Step #5 

As in past years, after receiving 
training in test administration 
from state-level staff as part of 
the statewide assessment pro- 
cess, the classroom teachers ad- 
minister the criterion-referenced 
tests in reading and mathemat- 



Advantages. The model has a high level of scientific rigor. It provides the 
strongest basis for attributing the detected difference to the intervention. 
It has the potential of ruling out all extraneous factors that might have 
contributed to the outcomes. 

Disadvantages. The model is probably the least feasible to implement, 
particularly in a school setting. It is almost never feasible to randomly 
assign students to the intervention and control groups. The process can be 
very disruptive. Another disadvantage is that it requires data collection for 
both the intervention and control groups, increasing data burden and cost. 

^ B4 





ics to students in grades three 
and five in April. The tests, 
which have been aligned with 
the state content standards, con- 
sist of multiple-choice items and 
a few open-ended items. All stu- 
dents — those participating in 
Reading Enhancement and the 
control group students — take 
the tests. The tests are scored 
by a vendor and the results 
provided to the school and 
district as well as the state 
department of education. 

In addition, the writing assess- 
ment is conducted with students 
in grade five following proce- 
dures established by the district 
office. Student attendance data 
are obtained at the end of the 
school year from school records. 

Step #6 

A database is set up to store 
and manage all the data. The 
database contains statewide 
assessment data (reading and 
mathematics), districtwide 
writing assessment data, as 
well as attendance data for 
both project and control stu- 
dents. The data are analyzed to 
provide percentages of students 
who meet the state standards or 
benchmarks in reading, mathe- 
matics, and writing— separately 
for project and control students. 
A difference in percentage points 
between the two groups provides 
an indication of impact. 



In addition, an analysis is 
conducted on the mean dif- 
ferences of standard scores 
in reading and mathematics 
as well as ratings in writing 
assessment between project 
and control students. A t test 
is performed to determine the 
statistical significance of each 
mean difference. A significant 
difference indicates that the 
intervention has, with certain 
statistical probability, made a 
real difference in student per- 
formance. In addition, for each 
grade and each subject area, an 
effect size is calculated to as- 
sess the magnitude or educa- 
tional significance of the 
difference. 

Attendance data are analyzed 
to provide an average (mean or 
median) number of days absent 
during the school year for proj- 
ect and control students. 

Step #7 

Results of the analysis are pro- 
vided in reader-friendly data 
displays (e.g., bar charts and 
line graphs) and easy-to-under- 
stand narratives. They are shared 
and discussed with stakeholder 
groups, including school staff, 
site council, parents, and mem- 
bers of the community. 



Step #8 

The results are provided to the 
district office and the state de- 
partment of education to de- 
termine whether adequate 
progress has been made by the 
school. In addition, a meeting 
is held with the school leader- 
ship team, other key school 
staff, parents, and community 
members for an indepth review 
of the data to explore plausible 
reasons for the findings and to 
develop recommendations and 
an action plan for continuous 
improvement. 

Table 1 provides a summary 
of the models along with their 
respective advantages and dis- 
advantages. 



85 



Impact Evaluation 



00 




? 




Impact Evaluation 




The Evaluation Process 

R egardless of which model is 
used, the evaluation process 
consists of a series of critical 
steps, including the following: 



CO 

c 

o 

■-M 

10 

Q) 

13 

(y 



1. What questions do we want 
to address? 

2. What do we want to Look 
at? What indicators and 
measures do we use? 

3. How do we collect the 
data? 

4. How do we analyze the data? 

5. How do we interpret the 
data? What are the data 
telling us? 

6. How do we use data to 
improve the program? What 
follow-up actions should be 
taken? 

7. Are follow-up actions 
making a difference? 



These steps are interrelated. 
Each is further discussed below. 

Questions To Address 

or impact evaluation, the 
overall question is whether 
and in what ways the interven- 
tion has made a difference for 
students, teachers, and the 
school as a whole. However, 
under this overall question, a 
host of more specific questions 
may be addressed by the evalu- 
ation. Examples include: 

■ How is the school and/or 
district administration provid- 
ing support for the school re- 
form effort? 

■ In what ways are teachers 
changing and improving their 
instructional practice? 





■ In what ways are students 
improving their performance? 

Evaluation questions can be 
framed with even greater speci- 
ficity as follows: 

■ Does the school reform effort 
result in an increased percent- 
age of third-grade students 
meeting state benchmarks 

in reading and mathematics? 

■ Does the school reform effort 
result in an increased percent- 
age of teachers participating 
in professional development 
activities? 

■ Does the school reform effort 
result in improved student at- 
tendance? 

■ Does the school reform effort 
result in a decreased number 
of discipline problems? 

Some of these questions may 
have come directly from the 
stakeholders. Others may be 
based on stated program goals 
and objectives. Yet others may 
address specific program perfor- 
mance indicators. It is impor- 
tant that all key stakeholders 
are involved in making the de- 
cision on what questions the 
evaluation should address. 



It is important that all key 
stakeholders are involved in 
making the decision on what 
questions the evaluation should 
address. 



Choosing Indicators 
and Measures 

O nce the evaluation ques- 
tions are formulated, it is 
normally an easy step to decide 
what indicators (such as reading 
achievement or student atten- 
dance) and measures (scores on 
specific tests, for instance) we 
need to look at. As discussed 
earlier, these indicators can 
exist at various levels: school/ 
district administration, teachers, 
and students. For example, if the 
question has to do with the per- 
cent of students meeting state 
standards, then indicators may 
include student achievement in 
various academic areas (such as 
reading/language arts and/ or 
mathematics). 

Typically, indicators include 
student performance scores on 
the following measures: 

■ Norm-referenced tests 
■ Criterion-referenced tests 
■ Performance-based assess- 
ments 

Norm-referenced tests (NRTs) 
are the most widely used stan- 
dardized assessment tool in the 
United States. Their primary 
purpose is to provide a general 
portrayal of student perfor- 
mance in comparison with a 
norm group. A norm-referenced 
test typically consists of multi- 
ple-choice items in the areas of 
reading, language arts, mathe- 
matics, science, and social stud- 
ies. Typically developed by a 
commercial publisher, NRTs 
provide such normative scores 
as percentiles, stanines, normal 
curve equivalents (NCEs), grade 
equivalents, and scale scores. 
These metrics are highly effi- 
cient for sorting and screening 



88 



Impart Evaluation 




purposes, but are limited in in- 
dicating what students know 
and can do at a particular 
grade level. 

Criterion-referenced tests 
(CRTs) are developed to assess 
the attainment of specific 
knowledge and skills. The test 
items, in a multiple-choice or 
an open-ended format, are con- 
structed to measure a particular 
skill or instructional objective 
(for example, sight vocabulary, 
reading fluency, recognition of 
the central theme of a story, 
addition with two-digit num- 
bers, basic algebraic concepts). 
In most cases, a cut score or 
mastery score is established to 
determine whether a student 
has mastered a specific skill. 

In this sense, assessments 
based on state standards or 
benchmarks are a form of crite- 
rion-referenced testing. Many 
states are using the services of 
commercial publishers to create 
their standards-based assess- 
ment systems. 



It is important to recognize that 
in addition to academic subjects, 
other indicators may also be 
pertinent, including the following: 

■ Attendance 

■ Dropout rates 

■ Discipline referrals 

■ Violence 



Performance-based assess- 
ments (PBAs) are created to 
provide students with opportu- 
nities to apply or demonstrate 
specific knowledge or skills in 
a particular content area. While 
a consensus has yet to emerge 
on a precise definition of per- 
formance-based assessments. 



such assessment devices gener- 
ally require the student to cre- 
ate a response to an open- 
ended question. Examples in- 
clude a short written answer. 



Norm-referenced measures are not 
consistent with the notion that all 
students will attain a particular 
level of knowledge and skills. 



a writing sample, an exhibition, 
and a portfolio. The response 
is typically scored or rated ac- 
cording to a set of specific crite- 
ria described in a scoring guide 
or rubric. The best-developed 
and most widely used perfor- 
mance-based assessment is 
traits-based writing assessment. 
Student writing samples are typ- 
ically rated on a six-point scale 
for such traits as ideas, organi- 
zation, word choice, voice, and 
conventions. PBAs allow teach- 
ers to incorporate assessment as 
an integral part of instruction. 

Also typically, these assess- 
ment devices cover the follow- 
ing academic areas: 

■ Reading/language arts 

■ Mathematics 

■ Writing 

■ Science 

■ Social studies 

In standards-based school re- 
form, it is probably more appro- 
priate to look at indicators that 
are standards-based rather than 
norm-referenced. Most states 
have both content and student 
performance standards that ad- 
dress the question of what stu- 
dents should know and be able to 
do at various benchmark points. 
In this context, a critically im- 
portant indicator is the per- 



centage of students meeting the 
state standards. Because they 
measure students against one 
another, rather than against 
an external standard, norm- 
referenced measures are not con- 
sistent with the notion that all 
students will attain a particular 
level of knowledge and skills. 

In addition to student out- 
comes, the evaluation may also 
look at indicators at the school 
and teacher levels. At the school 
level, we may want to find out 
whether and how the school ad- 
ministration is supporting the 
reform effort. Changes in policy 
and practice can occur in the 
following areas: 

■ Relecise time for teachers to 
plan improvement activities 

■ Reallocation of time and 
resources for professional 
development 

■ Acquiring external techni- 
cal assistance to enhance 
staff capacity 

At the teacher level, the evalu- 
ation may look at the following: 

■ Incidence of collegial 
learning 

■ Use of effective teaching 
practice 

■ Redesigning the curriculum 

■ Use of assessment informa- 
tion to improve instruction 



89 



Collecting Data 

S everal decisions need to be 
made here. For example, key 
decisions need to be made in the 
following areas: 

■ Which evaluation model is 
the most appropriate for ad- 
dressing the questions? 

■ What instruments should 
be used to collect the data? 

■ What are the data sources? 

■ Is sampling necessary or 
desired? 

■ Should we use multiple 
measures? 

Model Selection. Quite often, 
the evaluation question itself 
would suggest which evalua- 
tion model may be the most 
appropriate. For example, if 
we are interested in knowing 
not only whether the percent 
of students meeting state stan- 
dards is increasing but also 
whether the increase is greater 
than a comparable group, then 
the comparison group model 
is appropriate. On the other 
hand, if we are interested in 
knowing only whether the 
school is improving over time, 
then a pretest-posttest model 
may suffice. 

A model is seldom, if ever, 
entirely valid or invalid. Some 
models are generally more valid 
than others. There are other 
criteria schools should consider 
in choosing a particular model. 

First, we need to consider 
the purpose of the evaluation. 
When an evaluation is conducted 
for formative purposes (e.g., for 
program modification and re- 
finement), the ability to make a 
causal link may be less impor- 
tant than when it is conducted 



for high-stakes, summative pur- 
poses (e.g., for program contin- 
uation). A less rigorous model 
may be adequate for exploratory, 
formative investigations. 

Second, we need to consider 
feasibility. Generally, less vigor- 
ous models are easier to imple- 
ment than more rigorous models. 
For example, a true experimental 
design with random assignment 
of students to experimental and 
control groups is typically not 
feasible in the regular school 



Instrument Selection. De- 
pending on the nature of the 
specific indicators you are look- 
ing at, various instruments 
may be appropriate for data 
collection. For example, if the 
indicators have to do with aca- 
demic achievement, some sort 
of tests for assessment devices 
will be required for data collec- 
tion. If the indicators deal with 
teaching practice, a different 
set of instruments will be used 
to collect the relevant data. 
Such instruments may include 



The evaluator must weigh the importance and usefulness of the information 
against the resources needed to collect and analyze the data. 



setting. The use of naturally 
existing comparison groups, 
while less rigorous, is more 
feasible. Other factors related 
to feasibility include the intru- 
siveness of data collection pro- 
cedures as well as staff time 
and expertise for data collec- 
tion and analysis. For example, 
when teachers and school ad- 
ministrators serve as data col- 
lectors, data collection meth- 
ods need to be explicit and 
relatively straightforward. 

Third, cost is always an im- 
portant consideration. Gener- 
ally, the more rigorous models 
are more expensive than their 
less rigorous counterparts. The 
evaluator must weigh the impor- 
tance and usefulness of the in- 
formation against the resources 
needed to collect and analyze 
the data. The model selected 
should provide benefits commen- 
surate with the costs it incurs. 



90 



interview protocols, observation 
schedules, and/or focus-group 
meetings. Like the evaluation 
models, each data collection 
method has its advantages 
and disadvantages. 

Researchers and evaluators 
have developed a variety of data 
collection methods, including: 

■ Document review 

■ Questionnaire survey 

■ Interview 

■ Focus group 

■ Observation 

■ Assessment of student 

achievement 

Some methods are better 
suited for the collection of cer- 
tain types of data. Each has ad- 
vantages and disadvantages in 
terms of costs and other practi- 
cal and technical considerations 
(such as ease of use, accuracy, 
reliability, and validity). For 
example, there is no best way 
to conduct interviews. Your 
approach will depend on the 
practical considerations of get- 



Impact Evaluation 



ting the work done during the 
specified time period. Using a 
focus group — which is essen- 
tially a group interview — is more 
efficient than one-on-one inter- 
views. However, people often 
give different answers in groups 
than they do individually. They 
may feel freer to express per- 
sonal views in a private inter- 
view. At the same time, group 
conversations can draw out 
deeper insights as participants 
listen to what others are saying. 
Both approaches have value. 
Schools must weigh pros and 
cons against program goals. 

For both focus groups and in- 
terviews, the evaluator should 
work from a written interview 
guide that lists the questions 
and also provides space where 
the interviewer can record an- 
swers. Good interview ques- 
tions should be open-ended 
questions written in a clear, 
simple, conversational style. 



If your data collection plan 
calls for classroom observations, 
the evaluator needs to develop 
a guide that describes what he 
or she is looking for in the class- 
room. For example, the observer 
may be asked to look for ways 
the inservice training has 
changed classroom practice. 

Or she may be asked to note 
whether the teacher is using 
certain program materials. Dur- 
ing the visit itself, the evalua- 
tor should avoid disrupting the 
classroom activity. It is best if 
the evaluator sits in an unob- 
trusive place and uses the guide 
to focus on the relevant class- 
room actions. 

The Data Collection Matrix on 
the next page summarizes the 
advantages and disadvantages 
of each method. 



Table 2. Data Collection Matrix 




Impact Evaluation Matrix 




e 2. Data Collection Matrix (continued) 




lo 

CD 





05 



Impact Evaluation Matrix 




Case studies are not listed 
as a data collection method 
because they typically employ 
some or all of the data collec- 
tion methods under conditions 
specified in a fieldwork plan. 

A well-designed case study not 
only provides a rich documen- 
tation of program implementa- 
tion and outcomes but can often 
help make a logical connection 
between program activities and 
the desired outcomes. 

Data Sources. Various sources 
exist from which the evaluator 
may collect the pertinent data. 
Archival sources consist of ex- 
isting documents from which a 
wide array of data (such as stu- 
dent assessment data, atten- 
dance, and discipline referrals) 



administrators. Generally, 
teachers will be a better data 
source in this case because they 
have firsthand knowledge of 
the staff development activity 
and can provide a more valid 
and accurate picture of what 
took place and its potential 
impact. Similarly, in some 
cases, teachers' self-reports 
on instructional practice may 
be less accurate than data ob- 
tained from onsite observation 
by a trained observer. 

In addition, many data sources 
can be strengthened by some 
preparatory work. For example, 
a good explanation of the pur- 
pose of the evaluation, clear 
and concise instructions for 
completing a written survey. 



While each data source can provide valuable information on the selected 
indicators, care should be taken in deciding which data source may be best 
for which type of information. 



may be available. The primary 
data sources will probably be 
people who are participating 
in the school reform effort, 
including students, teachers, 
school administrators, parents, 
and community members. Typi- 
cally, survey and interview data 
on program implementation and 
outcomes will come from teach- 
ers, school administrators, par- 
ents, and community members. 
Achievement data will be gath- 
ered from students. 

While each data source can 
provide valuable information 
on the selected indicators, 
care should be taken in deciding 
which data source may be best 
for which type of information. 
For example, data on teacher 
professional development can 
^ come from teachers or school 

ERIC 



and a well-developed focus 
group guide can all enhance 
the validity of the data. Mak- 
ing sure that students know 
the purpose of a particular as- 
sessment and have adequate 
test-teddng skills can also in- 
crease the validity and accu- 
racy of the assessment results. 

Multiple Measures. In many 
instances, it is unlikely that a 
single measure will adequately 
assess the extent to which a 
program objective is attained, 
especially when the objective 
entails complex and multifaceted 
knowledge and skills on the part 
of students or teachers. In such 
cases, the use of multiple mea- 
sures and approaches can en- 
hance the validity, reliability, 
equity, and utility of the data 
as well as decisions about 

96 



teaching and learning. Multiple 
measures should be used to 
capitalize on the strengths of 
each data collection method. For 
example, survey data on changed 
practices at the classroom level 
can be supplemented with on- 
site observation data to en- 



The use of multiple measures 
and approaches can enhance the 
validity, reliability, equity, and 
utility of the data as well as 
decisions about teaching and 
learning. 



hance validity. Similarly, the 
validity of student performance 
data is enhanced when such 
data are gathered with differ- 
ent approaches and formats, 
includirig criterion-referenced 
tests, multiple-choice tests, 
writing samples, completion 
of tasks and projects, and 
portfolios of student work. 

Sampling. Sampling can re- 
duce data collection cost as 
well as burden on respondents. 
Matrix sampling, for example, 
allows a selected sample of the 
target population (for example, 
teachers or students) to respond 
to a selected sample of test or 
survey items. It reduces the 
amount of time and other re- 
sources for data collection in 
comparison with a study that 
requires the participation of all 
members of the target group. On 
the down side, sampling reduces 
the amount of information avail- 
able for individual students and 
teachers, and may make it dif- 
ficult to disaggregate data. 



Impact Evaluation 



Sampling can reduce data collection cost as well as burden on respondents. 



Sampling units can be indi- 
viduals (such as students or 
teachers), grade levels, schools, 
districts, or even states in a 
large-scale study. A simple 
random sample of individual 
students will consist of stu- 
dents randomly selected from 
the entire school, district, state, 
or nation. Similarly, a simple 
random sample of schools will 
consist of schools randomly se- 
lected from the district, state, 
or nation. 

The most efficient sampling 
method (with the smallest 
sampling error) is stratified 
random sampling (Sudman, 
1976). For example, within a 
school, you can &st randomly 
sample grade levels and then 
randomly select students 
within each grade level se- 
lected. The stratification fac- 
tors can be any variables that 
may potentially affect the out- 
comes, including grade level, 
gender, ethnic group, and 
poverty status. 

Data Quality. Selecting and 
using an appropriate evalua- 
tion model, instruments, data 
sources, sampUng methods, and 
multiple measures will help en- 
sure that high quality data are 
collected for the evaluation. 
Several criteria can be used to 
assess data quality, including 
validity, reliability, accuracy, 
curd utility. 

Validity is the most important 
consideration. The selected 
instrument, whether it is a 
norm-referenced test, a crite- 
rion-referenced test, or per- 
formance-based assessment. 



should measure what it is sup- 
posed to measure. For example, 
a test consisting of only multi- 
ple-choice items is not likely 
to provide valid information on 
students' higher order thinking 
skills. The selected instrument 
should have construct validity 
in the sense that it measures 
concepts and skills that are the 
targets of instruction. For stan- 
dards-based assessment, the 
instrument should be aligned 
with state content standards 
as well as classroom instruction. 

Reliability refers to the consis- 
tency of assessment results. For 
example, a test should provide 
very similar, if not identical, 
results if it is given to the 
same students twice over a 
short period of time (e.g., a 
week or two). When this is the 
case, the test is said to have 
high test-retest reliability. In 
addition, the items in the test 
should "hang together" in the 
sense that they measure the 
same skills and knowledge as 
indicated by an internal con- 
sistency measure. In writing 
assessment, reliability means 
that two or more trained raters 
using the same scoring rubrics 
should provide highly similar, 
if not identical, ratings for the 
same writing samples. 

Accuracy means that the assess- 
ment results are relatively free 
of measurement or sampling 
errors. These errors can come 
from poor test administration, 
use of inappropriate sampling 
procedures, and/or inadequate 
attention to scoring rubrics. 
Error sources can be rtunirnized 
by developing clearly written 



instructions for test administra- 
tion and scoring. When measure- 
ment errors are known to exist, 
they should be taken into ac- 
count in data interpretation. 

Finally, high quality data 
should also be user-friendly. 
This is particularly important 
when data are intended to be 
used by school staff to improve 
instruction or the entire pro- 
gram. It is critical that the data 
be meaningful to teachers and 
school administrators if they 
are expected to use the data 
for improvement purposes. In- 
volving school staff and parents 
in designing data collection ac- 
tivities can go a long way to en- 
hancing data utility — that the 
data will be used as intended. 

Data Collection Schedule. De- 
pending on the impact questions 
being addressed and the evalu- 
ation model used, data need 
to be collected at appropriate 
times during the school year. 

In many cases, the evaluator 
may be able to take advcuitage 
of data collection procedures 
that have already been put in 
place (e.g., a statewide assess- 
ment system). In other cases, 
the evaluator may be able to 
use archival data (i.e., data 



It is helpful to conduct a "data 
audit" to find out any and all 
existing data that can be used 
to address the evaluation 
questions before initiating 
new data collection activities. 



that have already been col- 
lected). In any case, it is help- 
ful to conduct a "data audit" 
to find out any and all exist- 
ing data that can be used to 
address the evaluation ques- 



tions before initiating new 
data collection activities. 



Pretest- Posttest 



In general, evaluation data 
should be collected repeatedly 
over time to demonstrate pat- 
terns and trends of student 
performance. For example, 
in the pretest-posttest model, 
data should be collected for 
at least two time points (e.g., 
at the beginning and end of 
a school year). It is, however, 
helpful to continue to collect 
data for additional time points 
on a regular basis (e.g., fall- 
spring, spring-spring, fall-fall, 
or some other annual cycles) 
over several years. This allows 
us to show performance trends 
and patterns as well as the sus- 
tained effects of the interven- 
tion. For the other models, 
longitudinal data are similarly 
desirable. The chart on this 
page provides examples of 
schedules for collecting stu- 
dent performance data for 
each evaluation model. 

The schedules on this page 
are examples only and should 
be modified to take advantage 
of existing data collection activ- 
ities. For example, statewide 
assessment, which often pro- 
vides much of the needed stu- 
dent performance data, may 
occur in March (or some other 
time of the school year) instead 
of April. In that case, March or 
another month of the school 
year will become the pretest 
and/or posttest date. 

Also, student performance 
data may be collected more 
frequently than fall to spring 
or once a year for instructional 
improvement purposes. Many 
comprehensive school reform 
models require the collection of 



Z3 

"O 

Q) 



U 

CO 



c 

o 



u 

CD 



O 

U 



fD 



fU 

Q 



06 

CD 



T3 

O 



C 

o 



fU 



ru 

> 

LU 



Option A: Fall-Spring 
September (Pretest) 

April (Posttest) 

Option B: Annual (Spring-Spring) 

April-April (Pretest-Posttest) 

Note. In this model, data are collected from project students only. 

Comparison Group 

Option A: Pretest-Posttest 
September (Pretest) 

April (Posttest) 

Option B: Posttest Only 
April (Posttest) 

Note. In this model, data are collected from both project and 
comparison group students. 

Regression 

September/October— Collection of demographic and other relevant 
contextual data (e.g., free or reduced-price lunch status and pretest 
scores) 

April— Collection of posttest data 

Note. In this model, data are collected from a larger population of 
students of which the project students may be a part (e.g., districtwide 
or statewide student population). 

Control Group 

Option A: Pretest-Posttest 
September (Pretest) 

April (Posttest) 

Option B: Posttest Only 
April (Posttest) 

Note. In this model, data are collected from both project and control 
group students. 



98 



Impact Evaluation 



assessment data on an ongoing 
basis (e.g., every eight weeks). 
Such data can and should be 
used for instruction planning 
as well as for impact evaluation. 

Data Management. There is 
a wide range of software pack- 
ages that the evaluator can use 
to manage the evaluation data. 
For example, the evaluator can 
set up a database with SPSS, 
Access, or Excel. Each requires 
a different level of technical 
expertise. For a relatively small 
school or district. Excel — the 
simplest of the three programs — 
should work well as a database 
software. For schools or districts 
with larger student enrollments, 
SPSS or Access may be more ef- 
ficient. For all software packages, 
the user manual typically pro- 
vides instructions for setting 
up and managing a database. 

Regardless of which software 
is used, the database should 
have the following capabilities: 

■ Include student achievement 
data on core subject areas (e.g., 
reading/language arts and 
mathematics) 

■ Include individual student 
demographic information 
(e.g., gender, ethnicity, mi- 
grant status, language profi- 
ciency status, disability status, 
economically disadvantaged 
status) 

■ Include data on other con- 
textual variables (e.g., atten- 
dance, teacher-student ratio, 
instruction, discipline, and 
violence) 

■ Track student performance 
over time (e.g., several years) 

■ Aggregate and disaggregate 
data (e.g., for total student 
population and various sub- 
groups) 



O 




m Include procedures for data 
analysis using both descriptive 
and inferential statistics 

To keep the database current 
and usable, it is critically im- 
portant that a staff member 
be designated to maintain the 
database once it is set up. This 
includes clear and specific pro- 
cedures for data entry in a 
timely manner and periodic 
checks on data quality. In 
many cases, student perfor- 
mance data can be extracted 
or exported electronically from 
other databases (e.g., statewide 
assessment data systems) into 
the evaluation database. 

Analyzing the Data 

T he most commonly used sta- 
tistics include the following: 

Frequency Count. A frequency 
count provides an enumeration 
of activities, things, or people 
that have certain pre-specified 
characteristics. Examples include: 

■ Number of teachers who 
participated in professional 
development activities 
■ Number of minutes of class 
time devoted to reading 
■ Number of students meeting 
state standards in reading 
■ Number of days absent for 
the average student per school 
year 

Frequency counts can often 
be categorized (e.g., 0, 1-5, 
6-10, more than 10) in data 
analysis. 



Percentage. A percentage tells 
us the proportion of activities, 
things, or people that have cer- 
tain characteristics within the 
total sample. Examples include: 

■ Percent of students in grade 
four meeting reading bench- 
marks 

■ Percent of minority students 
at a school 

■ Percent of students in a 
school district living in poverty 

■ Percent of teachers in a 
state participating in profes- 
sional development activities 

Percentage is probably the 
most commonly used statistic 
to show the current status as 
well as growth over time. For 
example, a school or district 
may set a goal to increase the 
proportion of students meeting 
state benchmarks by 5 percent 
each year. 

Mode. The mode is the most 
frequently occurring number 
in a data set. For example, in a 
writing assessment, if the most 
frequent rating is 3 (on a 6- 
point scale), then the mode 
rating is 3. The mode tells us 
what is the most typical case. 
In some instances, it gives us a 
better picture of what is going 
on than other statistics (e.g., 
the mean). 

Median. The median is the mid- 
dle or 50th percentile score. This 
is a good statistic to use to rep- 
resent the average when the 
score distribution is nowhere 
near normal. For example, in 
looking at attendance data, the 
median gives us a much better 
picture than the mean if a few 
students were absent for a huge 
portion of the school year. Un- 
like the mean, the median is 

99 



much less affected by a few 
outlying or extreme scores. 

Mean. The mean is the most 
commonly used statistic to rep- 
resent the average in research 
and evaluation studies. It is 
derived by dividing the sum by 
the total number of units (e.g., 
teachers or students) included 
in the summation. It tells us 
what the average teacher or 
student is Uke with respect 
to performance. The mean has 
mathematical properties that 
make it appropriate to use with 
many statistical procedures (e.g., 
test of statistical significance of a 
difference between two groups). 

Standard Deviation. Standard 
deviation shows the spread of 
a score distribution — the larger 
the standard deviation, the 
wider the spread. In survey 
data, it indicates the extent 
to which the respondents pro- 
vided similar responses or rat- 
ings. When the respondents 
provided the same or highly 
similar responses, the standard 
deviation of their responses will 
be small. A large standard devi- 
ation, on the other hand, sug- 
gests less agreement among the 
respondents. 



It is important, however, that 
conclusions and recommendations 
regarding program implementation 
and outcomes be based on patterns 
and trends of results rather than 
episodic differences. 



In most instances, data anal- 
ysis will be straightforward, 
using such descriptive statistics 
as frequency counts, averages, 
and percentages. It is impor- 
tant, however, that conclusions 
and recommendations regarding 



Data disaggregation can help identify areas in which a program is succeeding 
and areas in which improvement is needed. It can also identify areas where 
equity is an issue. 



program implementation and 
outcomes be based on patterns 
and trends of results rather 
than episodic differences that 
may represent little more than 
measurement errors or random 
fluctuation over time. 

Data analysis is facilitated if 
the project has clear and mea- 
surable goals and objectives 
(Yap, 1997). For example, if 
an objective of the project is 
to increase the percentage of 
third-graders meeting state 
reading benchmarks, then it 
is a relatively simple matter 
to compute the number and 
percent of these students who 
met the benchmarks. 

In some cases, you may want 
to use "inferential" statistics to 
analyze the data, especially if 
the evaluation has a high- 
stakes purpose, such as pro- 
gram funding. This is where 
you want to be sure that the 
detected differences (positive 
or negative) are not a result of 
random fluctuation. A variety 
of statistical procedures (such 
as a t test for differences be- 
tween two groups or analysis 
of variance among three or 
more groups) are available to 
assess the statistical signifi- 
cance of a detected difference. 

If such technical expertise is 
not available among the school 
staff, external help can be ob- 
tained to perform the analysis. 



In addition, data should be 
disaggregated whenever possi- 
ble. For example, data can be 
broken down by gender, ethnic 
group, school locale (urban and 
rural), and student type (eco- 
nomically disadvantaged, lim- 
ited-English proficient, migrant, 
disabled, and so forth). 

Schools and districts with 
Title I projects are required to 
disaggregate assessment data by: 

■ Major racial and ethnic 

group 

■ Gender 

■ English proficiency status 

■ Migrant status 

■ Disability status 

■ Economically disadvantaged 

status 

Schools and districts must re- 
port the disaggregated data un- 
less the number of students in 
any group is too small to pro- 
vide statistically sound infor- 
mation or would reveal the 
identity of individual students. 
The most recent guidance (U.S. 
Department of Education, 1999, 
p. 49) from the U.S. Department 
of Education suggests that dis- 
aggregated data for subgroups 
of fewer than 10 students are 
probably not statistically sound 
and should not be reported. 

While schools are not re- 
quired to report disaggregated 
data for small samples, such data 
can and should be used for pur- 
poses of instructional or program 
improvement. In addition, there 
are ways of increasing the sam- 
ple size to make the disaggre- 



100 



Impaa Evaluation 



gated results more representa- 
tive. For excunple, student 
achievement data can be 
combined over time (e.g., for 
two or more consecutive years) 
or across grade levels for the 
same subject area to create a 
larger student sample for data 
disaggregation. 

Data disaggregation can help 
identify areas in which a pro- 
gram is succeeding and areas 
in which improvement is 
needed. It can also identify 
areas where equity is an issue. 
For excunple, disaggregation 
can serve as protection against 
"creaming" — a deliberate or 
unconscious attempt on the 
part of program staff to achieve 
better results by working only 
with more advantaged or prom- 
ising students. "Creaming" is 
not only discrirninatory, it also 
undermines the integrity of 
standards-based reform. 

Interpreting the Data 

his is where we ask the 
question: What are the data 
telling us? Contrary to a com- 
mon belief, data do not usually 
speak for themselves. The re- 
sults must be interpreted in 
an appropriate context. For 
this reason, interpretation is 
best conducted as a collabora- 
tive activity between the eval- 
uator and project staff. For 
example, differences in student 
performance over time can be 
a result of random fluctuation. 
The evaluator with statistical 
expertise can help decide 
whether that is the case or 
whether the difference is sta- 
tistically related to the inter- 
vention. Project staff, however, 
are generally in a better posi- 
tion to discuss the meaning of 



the difference and its implica- 
tions for teaching and learning. 

A wide array of test scores are 
used to measure student perfor- 
mance, including the following: 

Raw Scores. A raw score is 
simply the number of test 
items that a student answered 
correctly. For example, in a 60- 
item test, if the student re- 



sponded correctly to 45 items, 
then her raw score is 45. A raw 
score, which cannot exceed the 
total number of items in a test, 
has no inherent meaning. 

Percent Correct. This is the 
proportion of test items that 
a student answered correctly. 

In the above example, where 
the student responded cor- 
rectly to 45 of the 60 items 
in a test, her percent correct 
score is 75 — she responded 
correctly to 75 percent of the 
items included in the test. It is 
important that we do not con- 
fuse percent correct scores with 
percentile scores. 

Ratings. Ratings are typically 
provided in performance assess- 
ments. For example, writing 
samples are often rated by 
trained raters on a 6-point 
scale based on clearly defined 
rubrics or scoring guides. A 
student may receive a rating 
of 4, for example, for her writ- 
ing sample. Ratings can be pro- 
vided for the writing sample as 
a whole (holistic scoring) or for 
each of the traits of interest 
(e.g., ideas, voice, organiza- 
tion, conventions). 



Percentiles. Percentiles, a 
norm-referenced metric, indi- 
cate the percent of students in 
the norming sample — typically 
a nationally representative 
sample — who scored below a 
certain score. For example, if a 
student scores at the 60th per- 
centile, it means that 60 per- 
cent of the students in the 
norming sample scored below 
her score. Roughly speaking. 



she scores better than 60 per- 
cent of the students included in 
the norming sample. Percentile 
scores range from 1 to 99. 

Quartiles. Quartiles are cut- 
points in a particular score dis- 
tribution. Technically, there 
are three quartiles— at the 25th, 
50th, and 75th percentiles — 
which divide the distribution 
into four equal portions. For 
example, the top quartile con- 
sists of students who score at 
or above the 75th percentile. 
The bottom quartile consists of 
students who score at or below 
the 25th percentile. 

Stanines. Stanines are a nine- 
point scale created and used 
by the U.S. Army during World 
War II to screen out feeble- 
minded recruits. It has since 
enjoyed widespread use in edu- 
cation for screening and selec- 
tion purposes. Stanines provide 
an efficient way of sorting stu- 
dents into nine categories. 
Quite often, students are 
grouped in low (1-3), middle 
(4-6), and high (7-9) stanines. 



101 



Interpretation is best conducted as a collaborative activity between the 
evaluator and project staff. 




Normal Curve Equivalents. 

Normal curve equivalents (NCEs) 
were originally created for use 
in the evaluation of Title I proj- 
ects. The metric is closely re- 
lated to the percentile scale. 
Like percentiles, NCE scores 
range from 1 to 99. In fact, 
the two scales coincide at three 
points: 1st, 50th, and 99th per- 
centiles. Psychometrically, the 
critical difference between the 
two metrics is that NCEs form 
an equal-interval scale whereas 
percentiles do not. Being an 
equal-interval scale, NCEs are 
appropriate for use in statisti- 
cal calculations (e.g., in the 
computation of means and 
standard deviations). 

Grade Equivalents. Grade 
equivalent scores form a longi- 
tudinal scale to assess the mas- 
tery of skills and knowledge 
from kindergarten through 
the 12th grade. The school year 
is conceptually divided into 10 
learning months, the three 
months in summer being con- 
sidered as one learning month. 
Grade equivalents typically 
range from K to 12. Thus, a 
grade equivalent score of 2.5 
means that the student scores 
at a learning level of second 
grade and five months. Grade 
equivalents are derived from 
a complicated scaling process, 
which can often create confu- 
sion or result in misunder- 
standing and misuse of the 
metric. Suppose a second-grade 
student taldng a second-grade 
test obtains a grade equivalent 
score of 3.0. What does that 
mean? It means that had the 
average third-grade student 
taken the second-grade test 
at the beginning of the school 
year, she would have gotten 
the same score as the second- 



grade student. Conversely, sup- 
pose a third-grade student tak- 
ing a third-grade test obtains a 
grade equivalent score of 2.0. It 
means that had the average sec- 
ond-grade student taken the 
third-grade test at the begin- 
ning of the school year, she 
would have gotten the same 
score as the third-grade stu- 
dent. To add to the complexity, 
grade equivalents are typically 
based on statistical projections 
rather than test scores from real 
students. Thus, the meaning of 
"falling behind grade" or "scoring 
above grade" is not as straight- 
forward as it might seem. 

Standard Scores. Standard 
scores form a longitudinal scale 
to assess the mastery of skills 
and knowledge from kinder- 
garten through the 12th grade. 
Derived from a sophisticated 
scaling process, standard scores 
link the various test levels in a 
battery of norm-referenced or 
criterion-referenced tests into a 
single scale. Normally, a student 
in a lower grade is expected to 
have a lower standard score 
than a student in a higher 
grade. As a student moves on 
to higher grades, her score is 
expected to increase. Standard 
scores can serve as cut-scores 
for various levels of proficiency 
in a core subject area (e.g., par- 
tially proficient, proficient, and 
advanced in reading or mathe- 
matics). In this sense, they are 
particularly useful in standards- 
based cissessments. Typically a 
three-digit number, standard 
scores have other names such 
as scale scores or expanded 
standard scores. 



Two other considerations are 
critically important in interpret- 
ing test scores. First, not all test 
scores have equal intervals. For 
example, percentiles and most 
grade equivalent scores are not 
equal-interval scales. They are 
not suitable for use in the cal- 
culation of various statistical 
indices (e.g., mean and stan- 
dard deviation). This is because 
a unit on the scale may have 
different meaning and impor- 
tance relative to other units, 
depending on where it is on 
the scale. For example, on the 
percentile scale, the units are 
narrower or tighter in the mid- 
dle range than those at the 
high or low end. The NCE scale, 
on the other hand, consists of 
units of equal size along the 
entire scale. 

Second, some test scores are 
status scores in the sense that 
they show the achievement sta- 
tus of a student or a group of 
students relative to other stu- 
dents. Percentiles, NCEs, and 
stanines are examples of status 
scores. On the other hand, lon- 
gitudinal scores indicate where 
a student or a group of stu- 
dents is on a continuum of 
skills or content knowledge. 
Standard scores and grade 
equivalents are examples of 
longitudinal scores. 

The following matrix provides 
a classification of the commonly 
used test scores along the two 
dimensions. 



102 



Impaa Evaluation 





Status Scores 




Equal-Interval 


Non Equal-Interval* 


fU 

QJ 

4-^ 

c 

HH 


■ Stanines 

■ Normal curve equivalents 

■ Percent correct 

■ Ratings 


■ Percentiles 

■ Quartiles 

■ Raw scores 


03 

=3 


Longitudinal Scores 




CT 

LU 


Equal-Interval 


Non Equal-Interval* 


C 

o 


■ Standard scores 


■ Grade equivalents 


3 

CT 

LU 


*Not appropriate for direct statistical computation (e.g., calculation of 
mean and standard deviation). Strictly speaking, raw scores are not an 
equal‘interval scale, even though they are often used in statistical com- 
putation. 



Evaluators commonly say that 
a difference is "significant" or 
"not significant." Typically, they 
are referring to the statistical 
significance of a difference be- 
tween the experimental or proj- 
ect students and the control/ 
comparison students. A signifi- 
cant difference in this sense 
merely means that it is unlikely 
that the detected difference is 
a result of random fluctuation. 
For example, when a difference 
is said to be significant at the 
.05 level — a conventional level 
of significance — it means that 
the difference can be a result 
of random fluctuation only 
about 5 percent of the time. 

To the extent that 5 percent 
is considered a low probability, 
one may conclude that the dif- 
ference is probably not due to 
random fluctuation and, in that 
sense, is a real difference. 

However, a "real" difference 
may be small or large. It does 
not tell us anything about the 
practical or educational value 
of the difference. The value or 
practical importance of the dif- 
ference is essentially a judgment 



call, to be determined by the 
key stakeholders participating 
in the intervention. Evaluators 
have come up with some rules 
of thumb to assess the practical 
importance of a difference. A 
common rule is that if the dif- 
ference is more than one-third 
of the standard deviation, it 
may be considered as having 
some practical importance. 

The normal curve equivalent 
(NCE) scores, for instance, 
have a standard deviation of 
approximately 21. A difference 
of 7 or more NCEs may there- 
fore be considered to have 
practical importance. 

Project staff, with intimate 
knowledge of program imple- 
mentation, can help provide a 
more complete explanation of 
the outcomes. For example, de- 
mographic changes or a sudden 
influx of transient students can 
significantly affect student 
outcomes. Such extenuating 
circumstances need to be con- 
sidered if data interpretation is 
to have credibility with project 
staff who are expected to use 
the evaluation results to im- 



prove program implementation 
and outcomes. 

Data interpretation is greatly 
facilitated if the project has set 
up measurable goals and objec- 
tives or has developed perfor- 
mance indicators that are 
readily assessable. Objectives 
or performance indicators that 
incorporate a standard or crite- 
rion make it easy to conclude 
whether the objective has been 
met. For instance, if an objec- 
tive requires 60 percent of the 
third-graders to meet state 
benchmarks, it is a relatively 
easy task to decide if the ob- 
jective is attained. 

Using Data for Program 
Improvement 

R esults of impact evaluation 
can serve a dual purpose: 
accountability and program 
improvement (Kushman & Yap, 
1999). Just like findings from 
program implementation evalu- 
ation, results of impact evalua- 
tion should also be useful to 
the project staff. While we 
need to know if the program 
is achieving the goals and ob- 
jectives it set out to achieve, 
it is also important that project 
staff be able to use the impact 
information to plan follow-up 
actions to further strengthen 
the program. 



Objectives or performance 
indicators that incorporate a 
standard or criterion make it easy 
to conclude \«hether the objective 
has been met. 



103 




100 



V) 

U— 

O 



90 

80 

70 

60 




Figure 7. Percent of students meeting mathematics benchmarks 



Like data interpretation, data 
use is best conducted as a col- 
laborative activity between the 
evaluator and project staff. The 
evaluator can present the data 
and findings in a way that is 
understandable and useful to 
project staff, who can then de- 
velop plans for program modifi- 
cation and refinement. A good 
way to do this is for the evalu- 
ator and project staff to engage 
in an interactive discussion on 



a need to re-examine and 
strengthen the eighth-grade 
mathematics curriculum. 

The action plan may consist 
of the adoption or adaptation 
of a new comprehensive school 
improvement model or the de- 
velopment of a home-grown 
approach to school improve- 
ment. It may seek to expand 
professional development of 
school staff. 



It is also important that project staff be able to use the impact information 
to plan follow-up actions to further strengthen the program. 



outcomes. For example, the 
evaluator can prepare the im- 
pact data in a graphical format 
as above: 

In this example, the project 
staff will be asked to develop 
a set of narratives, using their 
own words, to describe what 
the data are telling them. This 
will be followed by discussion 
and clarification until a con- 
sensus or agreement is reached 
on what the data say and/or 
imply. An action plan will then 
be developed to implement 
follow-up activities. In the 
^ above example, there .is clearly 

ERIC 



The action plan should have a 
time hne and should identify in- 
dividuals responsible for carrying 
out the planned activities. Like 
any program elements, the activ- 
ities should be research-based, 
challenging, and doable. For ex- 
ample, if the corrective action 
calls for further professional de- 
velopment, then the plan should 
be based on the principles of ef- 
fective practice in professional 
development, including: 



104 



The activities should be research- 
based, challenging, and doable. 



■ Activities are based on, and 
reflect, the best avculable re- 
search and practice 

■ Activities are ongoing, inten- 
sive, and sustained 

■ Content has direct applica- 
tion in practice 

■ Goals are developed with 
input from participants 

■ Goals are part of a long- 
term school improvement plan 

■ There is a formative (imple- 
mentation) and summative 
(impact) evaluation process 

■ Key stakeholders are in- 
volved in both the evaluation 
and refinement of the profes- 
sional development activities 

■ There is understanding 
among stakeholders of how 
professional development fits 
in the larger, overall school 
improvement plan 



Impact Evaluation 




Monitoring 
Follow-Up Actions 

T he implementation of the 
follow-up action plan needs 
to be monitored and evaluated. 
Particular attention should be 
focused on the intent of the 
corrective action. For example, 
if the correction consists of 
increased professional devel- 
opment, then implementation 
evaluation during the following 



The impact of the corrective action 
should be evaluated like other 
program components. 



year should include professional 
development as a focus. Data 
should be collected to indicate 
whether professional develop- 
ment activities have increased 
(compared with the preceding 
year) and to assess tiie quality 
of such activities. 

The impact of the corrective 
action should be evaluated like 
other program components. This 
makes program evaluation, both 
implementation and impact, an 
integral part of the school im- 
provement cycle — a process for 
continuous improvement. 



Resources 



Bernhardt, V.L. (1998). Data analysis for comprehensive schoolwide 
improvement. Larchmont, NY: Eye on Education. 

This book presents practical tools to help educators effectively 
gather, analyze, interpret, and use data to make better decisions 
for comprehensive schoolwide improvement. Written for non- 
statisticians, the book shows the reader how to collect and use a 
variety of data such as demographics, attendance/enrollment, and 
assessment data. 

Holcomb, E.L (1999). Getting excited about data: How to combine 
people, passion, and proof. Newbury Park, CA: Corwin Press. 

This practical manual answers questions about what data to col- 
lect, how to analyze data, and how to interpret and use the data 
for schoolwide improvement. 

Levesque, K., Bradby, D., Rossi, K., 8i Teitelbaum, P. (1998). At your 
fingertips: Using everyday data to improve schoob. Berkeley, CA: 
MPR Associates, Berkeley, CA: National Center for Research in 
Vocational Education, & Arlington, VA: American Association 
of School Administrators. 

This workbook is designed to help educators use a variety of data to 
better manage, monitor, and improve schools. The workbook is struc- 
tured to help teams and individuals develop performance indicator 
systems that can be used to identify strengths and weaknesses, and 
to develop educational strategies to meet educational goals. 

References 

Blum, R.E., Yap, K.O., 8< Butler, J.A. (1991). Onward to Excellence 
impact study. Portland, OR: Northwest Regional Educational 
Laboratory. 

Fetler, M.E., & Carlson, D.C. (1985). Identification of exemplary 
schools on a large scale. In G. Austin & H. Garber (Eds.), Research 
on exemplary schools (pp. 83-96). Orlando, FL: Academic Press. 

Joint Committee on Standards for Educational Evaluation. (1994). 
The program evaluation standards: How to assess evaluations of 
educational programs (2nd ed.). Thousand Oaks, CA: Sage. 

Kushman, J.W., & Yap, K.O. (1999). What makes the difference in 
school improvement? An impact study of Onward to Excellence 
in Mississippi schools. Journal of Education for Students Placed 
at Risk, 4(3), 277-298. 



105 



Messick, S. (1985). Progress toward standards as standards for pro- 
cess: A potential role for NAEP. Educational Measurement: Issues 
and Practice, 4(4), 16-19. 

Sudman, S. (1976). Applied sampling. New York: Academic Press. 

Tallmadge, G.K. (1982). An empirical assessment of norm-referenced 
evaluation methodology. Journal of Educational Measurement, 
19(2), 97-112. 

U.S. Department of Education. (1999). Peer review guidance for 
evaluating evidence affinal assessments under Title I of the Ele- 
mentary and Secondary Education Act. Washington, DC: Author. 

Yap, K.O. (1980). Pretest-posttest correlation and the special re- 
gression model. In American Statistical Association: 1980 proceed- 
ings of the social statistics section (pp. 236-240). Washington, DC: 
American Statistical Association. 

Yap, K.O. (1980, September). Pretest-posttest variance differentials 
and the special regression model. A paper presented at the annual 
meeting of the American Psychological Association, Montreal, 
Canada. 

Yap, K.O. (1997). Guidebook on developing performance indicators. 
Portland, OR: Northwest Regional Educational Laboratory. 

Yap, K.O., Estes, G.D., & Hansen, J.B. (1979, April). Effects of data 
analysis methods and selection procedures in regression models. 

A paper presented at the annual meeting of the American Edu- 
cational Research Association, San Francisco, CA. 

Yap, K.O., Estes, G.D., & Nickel, P.R. (1988). A summative evalua- 
tion of the Kamehameha elementary education program as dis- 
seminated in Hawaii public schools. Portland, OR: Northwest 
Regional Educational Laboratory. 



Workshop Requirements 

T he following are general re- 
quirements for this training 
activity: 

Audience: District and school- 
level evaluators and key proj- 
ect staff responsible for the 
evaluation of whole-school 
reform efforts. 

Time: Two to four hours 

Group size: 20 to 30 participants 

Equipment: An overhead pro- 
jector and Chart-pack paper 

Materials: Transparencies, par- 
ticipant handouts, and a copy 
of guidebook (desired) 

Objective: To build local capac- 
ity in evaluating whole-school 
reform efforts through an in- 
teractive presentation and dis- 
cussion on impact evaluation. 

Begin the discussion by stat- 
ing the primary purpose of im- 
pact evaluation — to find out if 
the intervention (whole-school 
reform) has made a difference 
for schools, teachers and, most 
important, students. 

Then use the transparencies to 
continue with the presentation 
and discussion. The presentation 
should be as interactive as possi- 
ble. Since the audience is likely 
to consist of people with consid- 
erable experience and expertise 
with program evaluation, you 
should invite questions and 
comments from the audience 
as much as possible. 



106 



Impaa Evaluation 



Depending on the type of 
audience you have eind how 
detailed the presentation/ 
discussion needs to be, this 
session can last two to four 
hours. For district or school 
staff responsible for program 
evaluation, this can be made 
a work session in which the 
participants will complete 
the small-group activities 
as preplanning for their 
evaluation work. 

Instructions for Impact 
Evaluation Transparendes 

E ach transparency is related 
to a part of the guidebook. 
You should familiarize yourself 
with the contents of the guide- 
book before you use the trans- 
parencies. The guidebook 
generally gives you a pretty 
good idea about what you 
should say when you show 
a particular transparency. 



Transparency #1 

Explain that there are many 
ways to find out if an interven- 
tion has made a difference. 

Each evaluation model uses a 
different method and rationale 
to determine what things 
would have been like had there 
been no intervention. The dif- 
ference between actual and ex- 
pected outcomes is a measure 
of program impact. 



O 



ERIC 



The models are also different 
in that the results they produce 
allow us to attribute, with dif- 
fering degrees of confidence, 
the outcomes to the interven- 
tion. They also differ with re- 
spect to feasibility, cost, and 
obtrusiveness. Thus, each has 
advantages and disadvantages. 



Discuss the advantages and dis- 
advantages. Refer to Pages 50 
through 62 in the guidebook. 

Generally speaking, the models 
are presented in order of scien- 
tific rigor. The pretest-posttest 
model is the least rigorous and 
the control group model — a 
true experimental design — is 
the most rigorous. In a layper- 
son's perspective, one may say 
that the models answer the fol- 
lowing questions: 

Pretest-posttest model — 

Are things getting better? 

Comparison group model — 

Are you making a difference? 

Regression model — Are you 
doing better than expected? 

Control group model— Are you 
really making a difference? 

Transparency #2 

Present the pretest-posttest 
model as one that is highly 
doable and reasonable when 
evaluation resources and ex- 
pertise are limited. It measures 
outcomes at a rtiiriimum of two 
time points — pretest and 
posttest. However, it is best 
conducted with measures re- 
peated at regular intervals, 
for example, each fall and 
spring or annually. 

The assumption of this model 
is that, without the interven- 
tion, things at posttest time 
will be the same as they were at 
pretest time. Teachers will teach 
the same way and students will 
learn the same way. Any differ- 
ence will, therefore, likely be a 
result of the intervention. 



Briefly discuss the advantages 
and disadvantages of the model 
as discussed on Page 51, in the 
guidebook. 

Explain that the best way to 
use the pretest-posttest model 
is not just to do a pretest and 
a posttest. Rather, it should be 
repeated over a long period of 
time — preferably over several 
years to show longitudinal pat- 
terns and trends. Even though 
this model does not provide a 
strong scientific basis for attri- 
buting impact to the interven- 
tion, a consistently positive 
trend can be compelling evidence 
that the program is working. 

See pages 50-53 in the guide- 
book. 

Transparency #3 

Present the comparison group 
model as one with relatively 
strong scientific rigor. It is 
generally doable when the 
school can find an appropriate 
comparison group — a school or 
groups of students with charac- 
teristics similar to those of stu- 
dents in the intervention. At 
the very least, the two groups 
(or schools) should be demo- 
graphically similar, including 
such factors as poverty level, 
percent of minority students, 
LEP population, and so on. 

The assumption of this model 
is that, without the interven- 
tion, things (including the way 
teachers teach and the way stu- 
dents learn) will be very much 
alike, if not identical, at the 
project and comparison schools. 
Any difference found at the end 
of the intervention will, there- 
fore, be attributable to the in- 
tervention. 



107 



One of the challenges of 
using this model is finding a 
comparison group that is simi- 
lar to the intervention group 
in all relevant respects and one 
that is willing to participate in 
the necessary data collection 
activities. In some cases, some 
sort of incentive (such as a sum- 
mary of findings of the study) 
may need to be provided to 
get such cooperation. 

Briefly discuss the advantages 
and disadvantages of the model 
as described on Page 55 in the 
guidebook. 

Transparency #4 

Present the regression model 
as one that is of great interest 
to evaluators and researchers. 
While it is more doable in a 
school setting than people 
might think, it does require 
statistical skills not normally 
available among school staff. It 
is likely that some external as- 
sistance will be needed if this 
model is chosen. 

The assumption of this model 
is that the regression procedure 
can provide a highly accurate 
prediction of what things would 
have been like in the absence 
of the intervention, especially 
when all relevant variables are 
accounted for in the equation. 
The difference (as shown in the 
transparency) between the pre- 
dicted status and actual status 
at the end of the intervention 
period is attributable to the in- 
tervention. 

The unit of measurement and 
analysis can be individual stu- 
dents, schools, or other entities 
of interest. For example, indi- 
vidual student scores can be 



used to establish the regression 
equation. This will probably be 
done by grade level. The proce- 
dure will then provide a pre- 
dicted score for each student. 

On a larger scale, schools can 
be used as the unit in setting 
up the equation. In that case, 
school averages, for both stu- 
dent performance and demo- 
graphics, will be used as the 
scores to be included in the re- 
gression equation. Again, this is 
best conducted by grade level. 
The equation will then provide 
a predicted score for each grade 
level for the school as a whole. 

Briefly discuss the advantages 
and disadvantages of the model 
as described on Page 57 in the 
guidebook. 

Transparenqf #5 

Introduce the control group 
model as a true experimental 
design with the highest level 
of scientific rigor. Random as- 
signment of students or other 
entities of interest to the inter- 
vention and control groups can 
potentially rule out all extrane- 
ous factors that may affect the 
outcomes, making it easy to at- 
tribute program impact. 

The assumption of the model 
is that the project and control 
groups are truly equivalent in 
all relevant respects and, with- 
out the intervention, we would 
expect the same things to hap- 
pen in both groups. If there is 
a difference at the end of the 
intervention period, that will be 
attributed to the intervention. 

A challenge of the model is 
random assignment of students 
to project and control groups. 
This is rarely, if ever, feasible 

tits 



in an ordinary school setting. 
Randomly assigning larger enti- 
ties (e.g., classes or schools) is 
sometimes more feasible. How- 
ever, with larger entities, even 
random assignment may not re- 
sult in truly equivalent groups. 

The control group model, 
even though rarely feasible, 
serves as an ideal that schools 
can approximate to the extent 
possible. When this model is 
used, we can attribute the dif- 
ference between the two groups, 
as shown in the transparency, 
to the intervention with a 
great deal of confidence. 

Briefly discuss the advan- 
tages and disadvantages of the 
model as described on Page 60 
in the guidebook. 

Close the discussion of eval- 
uation models by directing at- 
tention to Impact Evaluation 
Handout #1, which summarizes 
the advantages and disadvan- 
tages of each model. 

Transparency #6 

Walk the audience through 
the evaluation process, point- 
ing out that steps are interac- 
tive and build on each other. 

. It is important to point out 
that the project needs to set 
up mecisurable goals and objec- 
tives or performance indicators 
that can be assessed — those 
with some sort of standards 
or criteria built in. 

Schools will probably want to 
look at outcomes at more than 
one level. For example, they 
might want to find out whether, 
as a result of the whole-school 
reform: 



Impact Evaluation 



■ School policy and practice 
have changed, particularly 
with respect to professional 
development and allocation 
of time and resources 

■ Instructional practice has 
changed 

■ Student performance patterns 
have changed 

Students are the ultimate 
beneficiaries of school reform. 

It would be difficult to justify 
leaving out student outcomes 
in an impact evaluation of 
whole-school reform effort. 

We need to look at the evalua- 
tion process from a cost-benefit 
perspective. For example, some 
models and data collection 
methods are more expensive 
or time-consuming than others. 
We need to make sure the ex- 
pected benefits to the target 
groups (students, teachers, and 
schools) are commensurate 
with the cost incurred. 

All of the steps, but especially 
the last three steps, in the pro- 
cess are best conducted as a 
collaborative effort between 
the evaluator and project or 
school staff. The evaluator 
can present the results and 
the project staff can bring 
their craft knowledge about 
the reform effort to help inter- 
pret the findings and to plan 
follow-up actions. Ultimately, 
only project staff — not the 
evaluator — can use evaluation 
data to improve the project. 

Relevant contents are provided 
on Pages 63-78 of the guidebook. 



Transparenq^ #7 

Explain that there are many 
ways of collecting evaluation 
data. Some are better suited 
for gathering certain types of 
data as discussed on Pages 65- 
72 in the guidebook. Some are 
more expensive than others. 
Each has advantages and dis- 
advantages. Again, cost and 
benefits should be considered 
in data collection. Generally, 
more indepth information costs 
more and is more time-consum- 
ing to collect. For example, a 
written survey is usually less 
expensive than onsite observa- 
tion but may provide only a 
very global picture of program 
implementation. 

Briefly discuss each data col- 
lection method as described on 
Pages 67-68 in the guidebook. 

At this point you may want 
to have the participants peruse 
the handout on data collection 
(Data Collection Matrix) and so- 
licit comments and observations. 

Transparenq^ #8 

Discuss data collection consid- 
erations as described on Pages 
65-72 in the guidebook, rein- 
forcing the notion that we 
want to collect data that are 
valid, reliable, and useful in 
the most cost-effective way. 

Selecting the most appropriate 
model will give us the most valid 
data for the intended purpose. 

Instruments must be valid, 
reliable, and cost-effective for 
the type of data we are collect- 
ing. For example, a written sur- 
vey on teaching practice may 
be less expensive, but onsite 



observation (which is more ex- 
pensive) can provide more ac- 
curate and useful data. 

Some data sources may be 
more valid than others. As a 
general rule, we should go to 
the primary source. For exam- 
ple, if we want to know the 
extent to which teachers par- 
ticipate in professional devel- 
opment activities, the data 
source should be teachers, 
not a district administrator. 

Sampling can reduce the cost 
of data collection. In some cases, 
sampling might even provide 
more accurate data where the 
response rate problem may be 
more serious. 

Multiple measures give us a 
more comprehensive and there- 
fore more accurate picture of 
program implementation and 
outcomes. 

Discuss data quality, data 
collection schedule, and data 
management as described on 
Pages 70-72. 

Transparency #9 

Briefly discuss the difference 
between descriptive statistics 
and inferential statistics. In 
many Ccises, the use of descrip- 
tive statistics (e.g., frequency 
counts, percentages, averages) 
may suffice, especially when 
the evaluation does not have 
high stakes. 

When it is necessary (such as 
in a high-stakes evaluation) to 
be sure that the impact is not 
a result of random fluctuation, 
inferential statistical proce- 
dures may be needed. In some 
cases, a t test to assess the sta- 

109 



tistical significance of the dif- 
ference between the project 
and comparison group may be 
all that is needed. In others, 
analysis of variance or other 
more sophisticated procedures 
to detect a "real" difference 
may be necessary. 

At this point, you may want 
to talk about different styles 
of data analysis. Data can be 
made to reveal the truth — 
which is what we are after — 
in various ways. For example, 
they can be squeezed, massaged, 
or brutally tortured to "confess" 
the truth as we see it. 

You may also gently remind 
your audience that while some 
facets of the truth may readily 
ooze out of the data, other 
facets use data as a shield to 
hide their identity. Sophisti- 
cated, high-voltage statistical 
procedures may be needed to 
penetrate the shield to get to 
the whole truth. Even then, one 
should be reminded that there 
are lies, damned lies, and then 
statistics. 

Back on a more serious note, 
you may want to discuss the dif- 
ference between statistical sig- 
nificance and the practical 
importance of any detected 
difference. See Page 76 in 
the guidebook. 

Evidence is more compelling 
when there is a consistent pat- 
tern or trend. For example, with 
the pretest-posttest model 
(which is generally less rigor- 
ous than the other models), if 
the student performance shows a 
consistently positive trend over 
multiple years, one may quite 
confidently say that something is 
going right with the intervention. 



Whenever feasible, data should 
be disaggregated. Title I requires 
data to be broken down by gen- 
der, ethnidty, poverty, language, 
migrant status, and disability 
status. Disaggregated data pro- 
vide us with a better understand- 
ing of how the intervention is 
working and can also reveal eq- 
uity issues which may other- 
wise not surface. See Pages 
73-74 in the guidebook. 

Transparency #10 

Explain that there are only 
a handful of statistical indices 
in common use. They are fre- 
quency count, percentage, 
mode, median, and mean/ 
standard deviation. Go over 
this quickly because most 
people in the audience proba- 
bly already know these indices. 

Frequency count tells us, for 
example, how many teachers par- 
ticipated in how many profes- 
sional development activities, 
how many minutes of the class 
time were devoted to reading, 
how many students were absent 
for how many days, and so on. 
Frequency counts can often be 
categorized (0, 1-5, 6-10, more 
than 10) in data analysis. 

Percentage tells us the pro- 
portion of teachers who partic- 
ipated in professional devel- 
opment activities, the propor- 
tion of students at various 
achievement levels (such as 
meeting state reading bench- 
marks), the proportion of stu- 
dents who dropped out, and so 
on. Percentage is probably the 
most commonly used statistic 
to show current status as well 
as growth over time. For exam- 
ple, a school or district may set a 
goal to increase the proportion 



of students meeting state bench- 
marks by 5 percent each year. 

Technically, mode is the most 
frequently occurring number 
in a data set. For example, in a 
writing assessment, if the most 
frequent rating is 3 (on a 6-point 
scale) then the mode rating is 
3. Mode tells us what is the 
most typical case. In some 
cases, it gives us a better pic- 
ture of what is going on than 
the mean. 

The median is the middle or 
50th percentile score. This is a 
good statistic when the score 
distribution is nowhere near 
normal. For example, in looking 
at attendance data, the median 
gives us a much better picture 
than the mean if a few students 
were absent for a huge portion 
of the school year. The median 
is much less affected by a few 
outlying or extreme scores. 

Mean and standard deviation 
are the most commonly used 
statistics in research and evalu- 
ation studies. The mean tells us 
the average — what the average 
teacher or student is like with 
respect to performance. For ex- 
ample, when we want to find 
out the difference between two 
groups (say, project and compar- 
ison groups) we compare the 
means for the two groups. 

Standard deviation shows the 
spread of the score distribution — 
the larger the standard devia- 
tion, the wider the spread. In 
survey data, it indicates the ex- 
tent to which the respondents 
provided similar responses or 
ratings. When the respondents 
provided the same or similar re- 
sponses, the standard deviation 
of their responses will be small. 



lltf 



Impaa Evaluation 



A larger standard deviation, on 
the other hand, suggests less 
agreement among the respon- 
dents. 

Transparency #11 

Show the transparency and go 
over the items quickly. Again, 
most people in the audience 
probably already know their . 
test scores well. 

Point out that ratings are 
typically used in performance- 
based assessment (e.g., writing 
assessment). Typically, the rat- 
ings are based on some well- 
developed scoring guide or 
rubrics. The ratings are usu- 
ally single-digit numbers. 

Point out that some test 
scores are not equal-interval 
scores, which means that they 
cannot be used in statistical 
calculation. For example, it is 
not appropriate to add and di- 
vide percentile scores to get an 
average. To get an average per- 
centile, we should do the com- 
putation with Normal Curve 
Equivalent (NCE) scores and 
then convert the average NCE 
to a percentile score. 

Strictly speaking, only sta- 
nines, NCEs, and standard scores 
are equal-interval scores. 

Also, test scores can be di- 
vided into status (horizontal) 
and longitudinal (vertical) 
scores. The status scores (e.g., 
percentiles, quartiles, stanines, 
and NCEs) compare the perfor- 
mance of a group of students 
with that of their peers. Longi- 
tudinal scores (grade equiva- 
lents and standard scores) 
show or capture a vertical 
scale or continuum of knowl- 



edge or skills by grade level or 
a hierarchy of difficulty. 

Transparency #12 

Show Transparency #12 when 
you do Small-Group Activity 
#3. See Small-Group Activity 
#3 for details. 

Transparency #13 

Use Transparency #13 when 
you do Small-Group Activity 
#4. See Small-Group Activity 
#4 for details. 

Impact Evaluation 
Small-Group Activities 

E ach small-group activity is 
designed to reinforce or 
stimulate the discussion on 
a particular topic or concept. 
They may be conducted before 
or after the discussion. If the 
activity is done before the dis- 
cussion, the topic should be 
briefly introduced first. As a 
presenter, you should guide the 
participants through the activ- 
ity and then lead an interactive 
discussion of the results of the 
groups' work, drawing from the 
contents of the guidebook as 
appropriate to reinforce and/ 
or enrich the discussion. 

The small-group activity can 
also be scheduled to follow a 
more detailed discussion of the 
topic. In this case, the activity 
provides a way for the partici- 
pants to apply what they have 
learned in the presentation and 
discussion. 



Ill 



Small Group Activity #1 
(20 minutes) 

This activity can be conducted 
before or cifter your presenta- 
tion on data collection (Impact 
Evaluation Transparencies #7 
and #8). If it is conducted be- 
fore the presentation, its pur- 
pose is to stimulate thinking 
about data collection issues. 

If it is done after the presenta- 
tion, its purpose is to reinforce 
ideas and concepts covered in 
your presentation. 

Divide the audience into 
groups of about five people. 

The group can consist of mem- 
bers of a school team or just 
participants selected by var- 
ious mecuis to form a group. 

The task of the group is to 
complete the data collection 
form (Impact Evaluation Hand- 
out #3) to reinforce what they 
have discussed about data col- 
lection, including methods, 
data sources, and instruments. 
The small group should identify 
a recorder and/or reporter to 
share the results with the en- 
tire group when the activity 
is completed. Allow 15 minutes 
for the small groups to complete 
the task and five minutes to 
share. To save time, you may 
ask only two or three volun- 
teer groups to share. 

Refer the participants to parts 
of the guidebook that discuss 
evaluation models and data col- 
lection methods (for example, 
the data collection matrix). 

As discussed in the guidebook, 
data collection methods can in- 
clude document review, inter- 
view (in person or over the tele- 
phone), written survey, focus 



groups, observation, and assess- 
ment of student performance. 

Data sources can include ex- 
isting documents and people, 
including students, teachers, 
school administrators, parents, 
and community members. 

Under "instrument," the small 
groups can provide generic labels 
(such as "teacher survey" or ti- 
tles of existing instruments as 
in the measurement of student 
achievement by a statewide test). 

At the end of the activity, 
you should briefly summarize 
the results and point out any 
common themes, patterns, or 
trends. If the concepts did not 
come up in the group discussion, 
you should briefly discuss the 
advantages and disadvantages 
of each data collection method 
with respect to validity, relia- 
bility, feasibility, cost, and 
data burden. 

Small-Group Activity #2 
(20 minutes) 

This activity can be conducted 
prior to or following your pre- 
sentation on data analysis (Im- 
pact Evaluation Transparencies 
#9, #10, and #11). If it is con- 
ducted before the presentation, 
its purpose is to stimulate think- 
ing about data analysis issues. 
If it is done after the presenta- 
tion, its purpose is to reinforce 
ideas and concepts covered in 
your presentation. 

Divide the audience into small 
groups of about five people. The 
group can consist of members 
of a school team or just partici- 
pants selected by various means 
to form a group. 



The task of the group is to 
complete the data analysis form 
(Impact Evaluation Handout #4) 
to reinforce what they have 
discussed about data analysis, 
including the use of descriptive 
and inferential statistics. 

The small group should iden- 
tify a recorder and/or reporter 
to share the results with the 
entire group when the activity 
is completed. Allow 15 minutes 
for the small groups to complete 
the task and five minutes to 
share. To save time, you may 
ask only two or three volunteer 
groups to share. 

Explain that under the col- 
umn heading of type of data, 
we are talking about whether it 
would be survey data, interview 
data, observation data, student 
outcome data, or others. 

Under data analysis method, 
members of the group should 
discuss whether they would 
compute frequencies, percent- 
ages, and/or averages. Would 
they set a standard or crite- 
rion? For example, would they 
want to see at least 50 percent 
of the teachers changing their 
instructional practice in accor- 
dance with what is specified 
in the school reform model? 
Would they look at student 
outcomes in addressing the 
evaluation question? How 
can they say instruction has 
improved unless students are 
learning better? Would they 
do any comparative analysis? 

Would they be dealing vnth 
open-ended, qualitative data, 
such as descriptions of changes 
in practice? Would they just 
summarize the verbal data? 



HZ 



Small-Group Activity #3 
(30 minutes) 

Show Impact Evaluation Trans- 
parency #12 when you do Small- 
Group Activity #3. 

Divide the audience into 
groups of about five people. 

The group can consist of mem- 
bers of a school team or just 
participants selected by various 
means to form a group. 

The task for members of the 
group is to review the student 
outcome data (percent of stu- 
dents meeting state bench- 
marks) and to state in their 
own words what the data mean 
to them. Collectively, they are 
to develop three narratives or 
statements that indicate what 
the data say or imply. Typically, 
these narratives are then used 
as the basis for developing im- 
provement plans. 

The small group should iden- 
tify a recorder and/or reporter 
to share the results with the 
entire group when the activity 
is completed. Allow 25 minutes 
for the small groups to complete 
the task and five minutes to 
share. To save time, you may 
ask only two or three volun- 
teer groups to share. 

At the end of the activity, 
you should briefly summarize 
the results and point out any 
common themes and findings. 



Impact Evaluation 



Small-Group Activity #4 
(30 minutes) 

Use Impact Evaluation Trans- 
parency #13 when you do Small- 
Group Activity #4. 

Divide the audience into 
groups of about five people. 

The group can consist of mem- 
bers of a school team or just 
participants selected by var- 
ious mecins to form a group. 

The task of the group is to 
review the student data dis- 
played in a graph. The same 
data are provided in a tabular 
format for Small Group Activity 
#3. The group is to develop key 
findings based on the data in 
response to the evaluation ques- 
tion of whether student perfor- 
mance is improving over time. 

Based on the key findings, 
the group will then decide 
what corrective action, if any, 
should be taken. The group will 
also decide who will be respon- 
sible for implementing the cor- 
rective action and when the 
action will be taken. 



The small group should iden- 
tify a recorder cind/or reporter 
to share the results with the 
entire group when the activity 
is completed. Allow 25 minutes 
for the small groups to complete 
the task cind five minutes to 
share. To save time, you may 
ask only two or three volunteer 
groups to share. 

At the end of the activity, 
you should briefly summarize 
the results and point out any 
common themes, patterns, or 
trends. If none of the groups 
mentioned it, you should point 
out that the eighth-grade cur- 
riculum clearly needs to be ex- 
amined and perhaps 
restructured. 



113 



Evaluation Model 






in 








m- Northwest Regional Educational Laboratory Impact Evaluation Transparency # I 




Pretest- Posttest Model 




)U30J3d 



O 

ERIC 



CD 



— Northwest Regional Educational Laboratory Impact Evaluation Transparency #2 




Comparison Group Model 




O 

O 



o 

05 



s 



o 

CO 



o 

in 



o 

^ 1 - 



o 

CO 



o 

CVJ 



o 

ERIC 



s)uapn)s ^o aSe^uaojad 



gP — Northwest Regional Educational Uboratory — ^ ^ Evaluation Transparency #3 




Regression Model 







ro 

D 

4-1 

U 

< 



C 

o 

w 

w 

(U 

(U 

a: 



■O 

(U 

4-1 

u 

0) 

Q. 

X 



fSl 



(N 



.rsi 



fSl 



(N 



S3J00S pjepue^s 



O 

CM 



ERIC 



Northwest Regional Educational Laboratory — Impact Evaluation Transparency #4 





Control Group Model 



e 




— Northwest Regional Educational Laboratory — — Impact Evaluation Transparency #5 




o 



if) 

if) 

cu 

u 

o 



c 

,o 

J-J 

ru 

_D 

> 

LU 

CD 




n.. 




biQ 

c 

!5 

CO 



£ 




'LO 

CV 



CVi 



o 

ERIC 



— Northwest Regional Educational Laboratory Impact Evaluation Transparency #6 





Data Collection Methods 









CD 





Northwest Re^onal Educational Laboratory “ Impact Evaluation Iwsparency #7 





Data Collection Considerations 



o 



C 

o 

u 

QJ 

to 



QJ 

T3 

O 



C 

o 

u 

QJ 

to 



C 

(D 



£ 



to 



C 








00 

CM 




Northwest Regional Educational Laboratory — Impact Evaluation Transparency #8 





o 



CO 

'to 

fD 

c 

< 

fD 

4-J 

fD 

Cl 




t-Tr. 



CO 

U 

'•I-' 

in 





CO 

•o 

C 

CL> 



•o 

c 

fC 




c 

o 




biD 

biD 



fC 

CO 






o 

CO 



Northwest Regional Educational Laboratory — lmpa« Evaluation Transparency #9 




1/1 

CL> 

O 

u 

1/1 

CO 

cel 



T 



LO 

CO 



1/1 

LU 

U 



1/1 

•4—' 

c 

cc 

> 













C3 


1/1 

-4—' 












CT 


c 


•4—' 










(U 




u 

(U 




i/i 






(U 

£ 


cc 

> 


o 








C3 


’C3 


u 




tn 


1/1 

(U 

c 

c 


U 


CT 


•4—' 

c 

(U 

u 


1/1 

C 


’■+2 

c 

(U 

u 


’•+2 

cc 


’fO 

£ 


(U 

(U 

T3 

CC 

u 


cu 

CL 


*-+j 

cc 

DC 


(U 

CL 


3 

a 


CO 

■4— < 

LO 


o 

2 



1/1 

a; 

o 

u 

1/1 



cc 

TD 

C 

cc 

un 



CO 



o 

ERIC 



lESl — Northwest Regional Educational Laboratory Impact Evaluation Transparency # 1 1 




Normal Curve Equivalents 






ES8 — Northwest Regional Educational Laboratory Impact Evaluation Transparency # 1 1 b 





Percent of Students Meeting Mathematics Benchmarks 



© 




s^uapni^S P :;uaDJ3d 





Northwest Regional Educational Laboratory — Impact Evaluation Transparency # 1 2 




Handout: Advantages and Disadvantages of Evaluation Models 



Model 


Description 


Advantages 


Disadvantages 


Pretest- 

Posttest 


This model provides an 
expectation of program 
outcomes based on the 
current status. 


■ Highly feasible in a school 
setting 

■ Shows growth against 
baseline 

■ Shows patterns and trends 
if conducted longitudinally 

■ Can assess relative or 
absolute growth 


■ May lack rigor — difficult 
to attribute effects to 
program 

■ Difficult to control extra- 
neous factors 


Comparison 

Group 


This model provides an 
expectation of program 
outcomes based on a 
comparable group. 


■ Relatively strong scientific 
rigor 

■ Can attribute effects to 
program 

■ Can compare progress to- 
ward meeting common crite- 
ria (e.g., state standards) 


■ May be difficult to find a 
comparable group 

■ Selected groups may differ 
in some important but un- 
known ways 

■ Increased data collection 
burden 


Regression 


This model uses a sta- 
tistical method to pre- 
dict or project program 
outcomes 


■ Relatively strong scientific 
rigor 

■ Can statistically control for 
extraneous factors affecting 
outcomes 

■ Does not require existing 
control or comparison groups 


■ Feasibility depends on 
availability of sufficient 
archival data 

■ Model can be misused 

■ Statistical expertise gener- 
ally not available among ex- 
isting school/district staff 


Control 

Group 


This model provides an 
expectation of program 
outcomes based on 
what happens in an 
equivalent or control 
group. 


■ Has the strongest scien- 
tific rigor with random as- 
signment of students to 
intervention 

■ Can statistically control for 
extraneous factors affecting 
outcomes 

■ Can attribute effects to 
program 

■ Can compare progress to- 
ward meeting common crite- 
ria (e.g., state standards) 


■ May be difficult, if not 
impossible, to find an 
equivalent group 

■ Random assignment is 
typically not feasible in a 
school setting 

■ Increased data collection 
burden 



o 

ERIC 

fjSfSk — Northwest Regional Educational Laboratory 



142 




Impact Evaluation Handout # I 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Handout; Data Collection Matrix 



er|c 



Method 


Focus 


Advantages 


Disadvantages 


Document 

Review 


■ Nature and level of school 
reform activities 

■ Incidence of events of interest 

■ Existing student achievement 
information 


■ Data already exist 

■ Low cost 

■ Typically unobtrusive 

■ Relatively unbiased 


■ Lack of quality control 

■ Validity and reliability may 
be unknown 

■ Can be limited in scope 


Interview 


■ School staff/p arent/student 
perceptions 

■ School staff/parent satisfaction 

■ Improvement suggestions 

■ Degree of implementation 

■ Anticipated and unanticipated 
outcomes 


■ Indepth information 

■ Quality control 

■ High response rate 

■ Opportunity to probe 


■ Relatively costly 

■ Needs trained data collectors 

■ Data can be biased 

■ May require careful sampling 


Survey 


■ School staff/parent/student 
perceptions 

■ School staff/parent/student 
satisfaction 

■ Improvement suggestions 

■ Degree of implementation 

■ Anticipated and unantici- 
pated outcomes 


■ Relatively low cost 

■ Can include struc- 
tured and open-ended 
information 

■ Relative ease of 
administration 

■ Can cover a large 
number of respondents 


■ Response rate often a problem 

■ Needs careful sampling 

■ Data can be biased 

■ Open-ended data may be 
difficult to analyze 


Focus Group 


■ School staff/parent/student 
perceptions 

■ School staff/parent/student 
satisfaction 

■ Implementation issues 

■ Improvement suggestions 

■ Degree of implementation 

■ Anticipated and unantici- 
pated outcomes 


■ Indepth information on 
program implementation 
and outcomes 

■ Relatively free of re- 
sponse rate problems 

■ Interactive discussion 
among stakeholders 


■ Relatively high cost 

■ Needs trained facilitators 

■ May be difficult to achieve 
appropriate representation in 
recruitment of participants 

■ Group dynamics can bias 
discussion 


Observation 


■ Program implementation 

■ Classroom activities 

■ Instructional practices 

■ School climate 


■ Increased objectivity 
and authenticity of data 

■ Can provide contextual 
information 


■ Needs trained observers 

■ Relatively high cost 

■ Can be obtrusive 

■ Often just a snapshot of 
program implementation 

■ May not reflect typical reality 


Assessment 


■ Student performance in cog- 
nitive and affective domains 


■ Objective data often 
with known reliability 
and validity 

■ Can be low cost 
(standardized testing) 

ca Can include large 
samples of students 

1 ./I O 


■ Provides a generally accepted 
portrayal of schooling outcomes 

■ May provide a limited and 
narrow picture of student 
performance 

B Can be high cost (performance- 
based assessments) 

■ May need careful sampling 



Northwest Regional Educational Laboratory - 



Impact Evaluation Handout #2 ® 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 





Handout: Evaluating Program Impact 



Small Group Activity #1 — Collecting Data 

How do we collect data? 



Evaluation 

Question 


Data Collection 
Method 


Data Source 


Instrument 


Date 


In what ways is 
the school/district 
administration pro- 
viding support for 
the school reform 
effort? 











144 

o 

ERIC 

_ I r* I III 

i^wi — Northwest Regional Educational Laboratory — 



Impact Evaluation Handout #3 






Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Handout: Evaluating Program Impact 



Activity #2 — Analyzing Data 

How do we analyze data? 



Evaluation 


Type of Data 


Data Analysis 


Criteria 


Question 




Method 




In what ways are 
teachers chcinging 
and improving their 
instructional practice? 









14 



0 



o 

ERIC, 



Northwest Regional Educational Laboratory - 




Impact Evaluation Handout #4 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 





Handout: Evaluating Program Impact 



Activity #3 — Interpreting Data 

What are the data telling us? 

Percent of Students Meeting State Benchmarks 



Grade (Subject) 


94/95 


95/96 


96/97 


97/98 


Fourth (Reading) 


43 


38 


46 


55 


Eighth (Reading) 


34 


31 


40 


32 


Fourth (Math) 


24 


36 


44 


55 


Eighth (Math) 


35 


29 


20 


29 



Percent of Students Meeting Mathematical Benchmarks 




Narratives: 

1 . 

2 . 



3 - 146 

o 

ERIC 

Northwest Regional Educational Laboratory 




Impaa Evaluation Handout #5 



Permission Is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Percent of Students 



Handout: Evaluating Program Impact 



Activity #4— Planning Follow-Up 



Percent of Students Meeting Reading Benchmarks 




School Year 



147 




Northwest Regional Educational Laboratory 



Impact Evaluation Handout #6 





Handout: Evaluating Program Impact 



Activity #4 — Planning Follow-Up 



Evaluation 


Key Findings 


Action To 


Person 


Date 


Question 




Be Taken 


Responsible 




Is student perfor- 










mance improving 
over time? 











O 

ERIC 

iasq — Northwest Regional Educational Laboratory 



148 




Impaa Evaluation Handout #7 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Design Sample 



Instructions to the Presenter 



E valuation of schoolwide proj- 
ects is needed in order to 
assess the level and degree of 
student achievement attribut- 
able to change efforts. Various 
evaluation models, theories, and 
approaches have proliferated. A 
single, one-size-fits-all approach 
to evaluation is difficult, if not 
impossible to define. Rather, a 
multiple-method approach will 
be needed and the methods used 
will vary from school to school 
as well. Evaluation is not a sin- 
gle method, design, or approach 
but a variety of activities from 
which to pick and choose as 
appropriate to meet account- 
ability requirements and infor- 
mation needs with available 
resources. A comprehensive eval- 
uation will provide answers to 
all parts of the question, "Who 
does what to whom, with what 
results, at what costs?" A rigor- 
ous evaluation to completely 
answer this question is typi- 
cally beyond the resources of 
most local projects. It is neces- 
sary to decide which parts of 
this question are most relevant 
and feasible to answer in the 
schoolwide evaluation effort. 

The following activity is de- 
signed to help you use the in- 
formation presented in this 
guidebook to identify some 
conceptual distinctions rele- 
vant to evaluating schoolwide 
projects. The type of schoolwide 
evaluation conducted can range 
from a simple impact study with 
little attention paid to imple- 
mentation issues and a focus 
on a single measure of student 
achievement to a complex, 
fully-designed formative 
and summative evaluation. 



In addition to discussing the 
strengths and weaknesses of 
the three evaluation designs, 
the information provided in 
this guidebook can be used to 
determine whether the schools 
have built a rational cause and 
effect relationship between the 
schoolwide model activities and 
their impact on student achieve- 
ment. That is, can the school 
demonstrate that the school- 
wide model being implemented 
has direct relationships to 
changes in student learning? 

Small-Group Activity #5 
(40 minutes) 

This activity should be completed 
at the end of the training. Below 
are three examples of evaluations 
used by schools interested in 
identifjdng the success of their 
schoolwide project. Break into 
three small groups, each group 
taking one of the school scenar- 
ios, and discuss the nature of 
the evaluation using the infor- 
mation provided in this guide- 
book to answer the questions 
following each scenario. At the 
end of the activity, please report 
the results of your discussion to 
the full group. 

School 1 Scenario 

An elementary school with 
grades kindergarten through 
sixth implemented a school- 
wide reading program — School 
Improvement Model A — this past 
year as part of the state's com- 
prehensive school reform initia- 
tive. The schoolwide reading 
model was selected because 
the school's expected ultimate 

ilu 



outcome of children meeting the 
state reading standards was suc- 
cessfully met in a neighboring 
school that had implemented 
the same reading model. Over- 
all, the principal felt the read- 
ing scores at his school were 
dismal; state assessments on 
writing and math were below 
the 50th percentile, as well, but 
the principal thought changes 
to the entire school curriculum 
would be too overwhelming for 
his school staff to endorse. 

Support from the schoolwide 
reading model developers con- 
sisted of a week-long training 
session for 12 of the 15 teachers 
two weeks before the beginning 
of school. The focus of the train- 
ing was how to implement the 
reading model. Part of the train- 
ing stressed the importance of 
completing a checklist of im- 
plementation indicators every 
eight weeks so staff could self- 
assess how well they were im- 
plementing the model's reading 
components; no other support 
was provided by the reading 
model developers. The three 
teachers who did not receive 
the staff development training 
received literature on the newly 
implemented model and were 
briefed by those who attended 
the training. None of the teach- 
ers reviewed the grant proposal 
that was awarded federal ^nds 
to implement the school reform 
model. Additionally, the lone 
support from the local school 
district came in the form of 
funds to implement the spe- 
cific schoolwide model. 



Design Sample 



The school evaluation plan 
took a minimalist approach to 
identifying model success; in- 
crease in student achievement 
was the sole impact criterion 
of the school. Baseline data on 
children's reading scores were 
at or below the 30th percentile 
as measured by the California 
Achievement Test (CAT). The 
goal of the school was to get 
90 percent of the underachiev- 
ing children to make one and 
a half years of progress on the 
reading section of the CAT. 

Discussion Questions for 
Activity #5, School 1 

1. What are the strengths of this 
evaluation? 

2. What are the limitations of 
this evaluation? 

3. What would improve the 
evaluation, at both the forma- 
tive and summative stages? 

4. Will there be evidence for fi- 
delity of model implementation? 

5. Is there sufficient evidence, 
collected to demonstrate the 
school's progress toward its goal? 

6. What evaluation model (for 
example, growth or pretest- 
posttest) is being utilized? 
What are the strengths and 
disadvantages of using this 
evaluation model? 



School 2 Scenario 

Staff at School 2 spent one year 
reviewing their school's strategic 
plans, the districtwide needs 
assessment, recent standard- 
ized tests, and parent surveys 
to help identify goals for the 
coming year in their elemen- 
tary school. These data helped 
the school staff decide to imple- 
ment a schoolwide model to help 
students become proficient in 
reading. Along with community 
members, the school staff felt 
that implementation of a more 
structured reading program 
would prepare students to 
meet reading standards set by 
the state and school district. 

The school decided it would 
need to implement a model that 
would achieve its goals of (1) 
getting all parents and children 
involved in the school program, 
and (2) bringing all students 
within one grade level in read- 
ing as measured by the state 
standardized test and with 80 
percent of the children passing 
the state benchmark assessment. 
Based on their desired outcomes. 
School 2 selected School Im- 
provement Model B to provide 
the best opportunity for the 
growth of their students. The 
staff also felt that the model 
supplemented its current math 
and writing curricula. The 
school also receives financial 
support and technical assistance 
from its local school district. The 
support offers teachers a chance 
to receive professional develop- 
ment and to obtain the appro- 
priate materials and equipment. 



Although the model chosen 
by the school supported the 
nine required components of 
CSRD, little evaluation consid- 
eration was given to each of 
the components. For example, 
no data are to be collected on 
sustained support within the 
school after the initial imple- 
mentation of the model. How- 
ever, the staff plan to work with 
the model developers on data 
collection surrounding the for- 
mative evaluation. Model B 
contains a schoolwide plan for 
instruction, assessment, class- 
room management, professional 
development, and parent involve- 
ment. The model focuses on 
shared reading, vocabulary 
building, and writing activities. 
Teachers have a detailed guide 
for teaching each component. 
The staff receive year-round 
professional development from 
the model developers. In addi- 
tion to receiving an initial pro- 
fessional development at the 
beginning of the school year 
by the model developers, school 
component meetings are con- 
ducted throughout the year. 
During the first year of opera- 
tion the school will receive 
two implementation checks 
from the model developers, with 
two implementation checks con- 
ducted during the second year. 
The model developers will use 
their own checklists to ensure 
proper model implementation. 
Annual curriculum refresher 
courses are offered to new 
teachers and anyone else on 
staff who feels the need for 
additional training. 



The model has specific bench- 
marks that align well with the 
state benchmarks. Therefore, the 
students will be assessed every 
two months on the model's cur- 
riculum-based measure, and 
those children who show the 
greatest need will get addi- 
tional help with their reading. 
The children are also assessed 
annually on the school district 
benchmark, as well as at third 
and sixth grades on the state 
benchmark assessment. 

Reports are provided to the 
school staff by the model de- 
velopers regarding what is 
going well in the school and 
next steps that should occur 
for proper implementation to 
occur. Data from the state 
reading test will provide the 
school staff with indicators 
of student achievement gains. 

At the end of the second year 
the school will hire a school dis- 
trict evaluator to help them com- 
pile, analyze, and interpret the 
comprehensive implementation 
data and the district and state 
benchmark assessments. These 
data will provide the staff with 
the information to determine 
changes in student achievement. 

Once the data have been ana- 
lyzed and interpreted, a report 
will be provided to the school to 
make any programmatic changes 
necessary to further improve 
students' academic success and 
improve parent involvement in 
the school. 



Discussion Questions for 
Activity #5, School 2 

1. What are the strengths of this 
evaluation? 

2. What are the limitations of 
this evaluation? 

3. What would improve the 
evaluation, at both the forma- 
tive eind summative stages? 

4. Will there be evidence for fi- 
delity of model implementation? 

5. Is there sufficient evidence 
collected to demonstrate the 
school's progress toward its goal? 

6. What evaluation model (for 
example, growth or pretest- 
posttest) is being utilized? 

What are the strengths and 
disadveintages of using this 
evaluation model? 

School 3 Scenario 

Upon hearing that the state of 
Oregon would fund 20 Compre- 
hensive School Reform Demon- 
stration (CSRD) sites in the 
coming year, staff at School 
3 begein to review their school's 
strategic plans, the districtwide 
needs assessment, recent stan- 
dardized tests, and parent sur- 
veys to identify areas in which 
they could help children per- 
form better in school. These 
data helped the school staff 
decide that a new school wide 
model could indeed help their 
students become more proficient 
in reading, an area where the 
latest district assessments indi- 
cated School 3's children were 
performing miserably. Abng with 
community members, the school 
staff felt that implementing a 
' more structured reading pro- 

m 



gram would prepare students to 
meet reading standards set by 
the state and school district. The 
school staff recently implemented 
a new schoolwide math model 
and a new literacy model, and 
the staff thought the implemen- 
tation of a new reading model 
would provide students with the 
richest of environments in which 
to learn. After support among 
school staff was obtained for im- 
plementing a new model, a com- 
mittee of teachers, the principal, 
and school district staff wrote a 
proposal for CSRD funding. School 
staff interested in reviewing the 
grant were encouraged to offer 
feedback. Once the proposal was 
funded, all teachers were re- 
quired to read the proposal. 

The primary goal — as deter- 
mined by the CSRD Advisory 
Committee made up of school 
staff, district staff, and parents 
of children attending School 3 — 
was for students to become more 
proficient in reading. Breaking 
this goal down even further, the 
measurable objectives were to 
increase the number of children 
reading at grade level by 2 per- 
cent each year and increase the 
number of children meeting the 
Oregon state standard for reading 
by 10 percent each year. The local 
school district provided a third- 
party evaluator to assist in defin- 
ing measurable goals and to help 
the staff identify how these goals 
could be achieved through a 
schoolwide model. The evaluator 
assisted in helping the school 
identify a research-based model 
that included classroom activities, 
curriculum, resources, and assess- 
ments that would help children 
perform better in School 3. The 
model chosen to support chil- 
dren's learning was School Im- 
provement Model C. 



Design Sample 



Model C contains a schoolwide 
plan for instruction, assessment, 
classroom management, profes- 
sional development, and par- 
ent involvement. The model 
focuses on shared reading, vo- 
cabulary building, and writing 
activities. Teachers have a de- 
tailed guide for teaching each 
component. The staff receives 
year-round professional devel- 
opment from the model devel- 
opers. In addition to receiving 
initial professional develop- 
ment at the beginning of the 
school year by the model de- 
velopers, school component 
meetings are conducted through- 
out the year. Annual curriculum 
refresher courses are offered to 
new teachers and anyone else 
on staff who feels the need for 
additional training. 

The advisory committee will 
oversee both the formative and 
summative evaluation. The com- 
mittee will meet at least every 
two months to review the on- 
going data collection. During 
the first year of operation, the 
school will receive three imple- 
mentation checks from the 
model developers, with two im- 
plementation checks conducted 
during the second year. This 
advisory committee, with the 
help of the model developers, will 
create a calendar and checklist 
to aid in the tracking of appro- 
priate model implementation. 
Interviews and surveys of stu- 
dents, teachers, and parents will 
be used to collect information 
on various aspects of model im- 
plementation. Additionally, 
classroom observations and 
focus groups with teachers will 
provide valuable data on how 
the comprehensive program is 
being implemented. The advi- 
sory committee's goal will be to 



O 




verify the success of the model 
implementation and make any 
modifications to classroom in- 
struction, parent involvement, 
or other program components. 

School 3's evaluation plan will 
identify progress toward its goal 
using both state and local data 
assessments. To measure progress 
using state assessments. School 3 
will use Title I Adequate Yearly 
Progress Criteria as a measure of 
academic progress. Local student 
performance measures are impor- 
tant to School 3 as well. The 
student performance goal is to 
improve student achievement in 
reading with the objective of in- 
creasing the percentage of stu- 
dents in grades one through six 
reading at grade level by the end 
of the first year of implementa- 
tion by 2 percent. Multiple mea- 
sures will be used to assess these 
changes. For example, local pre- 
and post-reading assessments will 
be administered as will the CSRD 
model's 10-week assessment. The 
final assessment will be a local 
literacy assessment to be admin- 
istered at the beginning and end 
of the school year. To ensure that 
the program is on the right track. 
School 3 created interim bench- 
marks. The objective of the in- 
terim benchmark is to increase 
the number of students reading 
at grade level by 0.6 percent each 
trimester. Students will be as- 
sessed with the model's 10-week 
assessment, the local reading as- 
sessment, and nightly reading 
homework records. Where possi- 
ble, the assessments will be con- 
ducted in the spring and fall. For 
example, fall and spring assess- 
ments on oral reading samples 
will be conducted to identify 
changes in student reading strat- 
egies and understanding of text. 



As is evident. School 3's eval- 
uation plan has two purposes: 
to document project activities 
and monitor progress toward 
expected outcomes and to sum- 
marize the overall progress of 
the plan's effectiveness. School 
3 is also concerned that each of 
the nine CSRD components is 
addressed in the program eval- 
uation. For each of the nine 
components, specific processes 
used to review, monitor, and 
adjust the school program are 
included as part of the evalua- 
tion plan. Some of the evalua- 
tion tools will be administered 
by the local evaluator, while 
others will be administered 
by the CSRD's model developer. 
Still others will be administered 
by the advisory committee staff. 
The tables below offer part of 
the evaluation of the nine 
CSRD components. 





Effective Research-Based 
Strategies 

Goal 

■ Implement the CSRD plan 
successfully 

■ Align classroom practice 
to Oregon benchmark 




Indicator/Strategy 

■ Implement strategies 
as intended by model 

■ Analysis of change in 
classroom practice 




Measurement 

■ Monitoring 

■ Teacher reflections on 


tH 


changes in classroom practices 


4-J 


Who 


c 

0 ) 


■ Advisory committee 


c 


■ Model developer 


CL 


When 


E 


■ 3 visits per year 


o 


■ Each term 


U' 



152 




Components: J Component 2: 



Comprehensive Design 
Goal 

■ Implement, monitor, and 
refine CSRD plan on ongoing 
basis 

Indicator/Strategy 

■ Review progress by checking 
interim student achievement 
data 

Measurement 

■ Implementation checklist 

■ Review and evaluate 
disaggregated data 

Who 

■ Advisory committee 

When 

■ Each term 



Professional Development 
Goal 

■ Implement a professional 
development plan that results 
in positive change in reading 
and parent involvement 

Indicator/Strategy 

■ Ensure full partidpation 
in activities 

■ Change in classroom 
practices 

Measurement 

■ Attendance at each activity 

■ Classroom observation 

Who 

■ Advisory committee 

■ Evaluator 

When 

■ Each term 

■ Ongoing 




School Support 
Goal 

■ Implement a professional 
development plan that results 
in positive change in reading 
and parent involvement 

Indicator/Strategy 

■ Advisory committee will 
communicate and solidt 
feedback 

Measurement 

■ Polling of staff by secret 
ballot to identify continued 
support of the model 

Who 

■ Evaluator 

When 

■ Annually 



Parent and Community 
Involvement 

Goal 

■ Intact and functioning 
family support team 

■ Family participation in 20 
minutes of reading homework 
nightly 

Indi cator/Strategy 

■ Weekly team meetings, 

! develop support plans for 
struggling youth 

■ Homework with parent 
signoff sheet 

Measurement 

■ Model monitoring process 

■ Monitor number of returned 
assignments 

Who 

■ Advisory committee 

■ Evaluator 

When 

■ Annually 

■ Each term 



153 



Once the data have been col- 
lected, the evaluator will work 
with the advisory committee on 
ways to analyze the data. Then, 
working as a group, they will 
begin to interpret the data. 

Once the final report has been 
completed, the evaluator and a 
member of the advisory commit- 
tee will present the findings at 
a community forum. Although 
programmatic changes were 
made throughout the project 
period, the final report will 
provide additional evidence 
for possible changes in pro- 
gram practices. 

Discussion Questions for 
Activity #5, School 3 

1. What are the strengths of this 
evaluation? 

2. What are the limitations of 
this evaluation? 

3. What would improve the eval- 
uation, at both the formative 
(implementation) and summa- 
tive (impact) stages? 

4. Will there be evidence for fi- 
delity of model implementation? 

5. Is there sufficient evidence 
collected to demonstrate the 
school's progress toward its goal? 

6. What evaluation model (such 
as growth or pretest-posttest) 
is being utilized? What are the 
strengths and disadvantages of 
using this evaluation model? In 
addition to answering the ques- 
tions after each scenario, dis- 
cuss whether the school person- 
nel or evaluator would be able to 
complete the following school- 
wide evaluation worksheet. If 
information is missing for any 



Design Sample 





component of the worksheet, 
discuss whether that information 
may be important to the school 
and, if so, what changes in the 
evaluation design would need 
to occur to provide sufficient 
evidence for program success. 

Evaluation Framework 
Schoolwide Evaluation 

B asic Evaluation Framework: 
The following is a brief de- 
scription of the elements needed 
for a sound school evaluation 
design. An evaluation design 
should express student perfor- 
mance goals. Ideally, the goals 
highlighted in the evaluation 
design should encompass, but 
not be limited to, all existing 
goals identified by your school 
in your schoolwide plan. Each 
identified student performance 
goal has a specific objective, strat- 
egy for attainment, indicators 
and benchmarks, and measure- 
ment method. 

Finally, discuss in your group 
whether the school in each of 
the scenarios has built a rational 
cause-and-effect relationship 
between the schoolwide model 
activities and their impact on 
student achievement. That is, 
can the school demonstrate that 
the model being implemented 
has a direct relationship to 
changes in student learning? 
For example, does a school's 
evaluation model identify how 
instructional elements (such as 
project-based activities or cur- 
riculum aligned to standards) 
relate to expected changes in 
how students learn, feel, and do 
in school? Furthermore, does 
the model identify the types of 
changes in student performance 
(such as attendance or problem- 



solving skills) that lead to at- 
taining the desired standard 
(say, meeting statewide per- 
formance standards)? 



(/) 

c 


Student performance goals — What do we want students to 
ultimately achieve? 

A general description of student goals. 

Objectives— lV/7ot do students need to specifically achieve to 
accomplish goals? 

A specific, measurable description of student performance 
that identifies a time frame for achieving goals. 

Strategies for attainment — What do schoob have to do to help 
students accomplish goab and objectives? 


o 


A description of the strategies, means, and methods used by 


E 


schools to accomplish student performance goals. 


nj 


Local indicators and benchmarks — What evidence do we need to 


Q 


demonstrate progress toward goab? 


"fO 


A specific description of the state, local, and interim indicators 


o 


and benchmarks to be used to measure progress toward student 




performance goals and objectives. 


■q5 


Measurement methods — How will we gather the evidence needed 


0) 


to demonstrate successful achievement of goals? 


_J 

1 1 


A specific description of the instruments or methods to be 


c 


used to gather evidence of progress toward attainment of 


cu 

"O 


student performance goals and objectives. 


D 


Source: Guidelines for preparing a charter school accountability plan, Massachusetts 


ID 


Department of Education 




O 




Examples of Data Collection Techniques Related to Implementation 



CD 

x: 



e 



CD 

in 



(/) 

S 

QJ 

•wm 

e 

QJ 

£ 

QJ 

*c5 

c 

c 

o 

tn 

QJ 

Cf 



>> 

> 

’-I-' 

u 

to 

H— ' 

c 

CD 

E 

Q. 

O 

CD 

> 

CD 

•a 

It 

to 

-I-' 

to 

OiD 



CD 

*to 

c 

c 

g 

'-I-' 

cn 

CD 

Z2 

cr 

to 

H— ' 

3 

O 

•a 

c 

CD 

LT> 



•a 

CD 

-I-' 

CL 

E 

o 

u 

CD 

> 

to 

JZ 

CD 

JZ 

-I-' 

-I-' 

to 

JZ 

-I-' 

C 

Z2 

it 

o 

u 

CD 

JZ 

u 



CD 

JZ 

u 

to 

CD 

-I-' 

< 



•a 

CD 

cn 

Z2 

tS) 

c 

’(D 

X) 

CD 

to 

cn 

g 

CD 

-I-' 

to 

E 

CD 

x: 



CD 

C 

E 

CD 

-I-' 

CD 

•a 



cn 

-I-' 

C 

CD 

•a 

Z2 

cn 

r- 

^ S 
E 

H- to 



CD 

JZ 

H— ' 

(U 

H— ' 

to 

OiD 

(U 



>> 

CD 

JZ 



o 

JZ 



in 

QJ 



to 

X 

(U 

u 



u 

CD 

CL 

in 



in 

(U 

JZ 

to c 

(D ^ 

3 

^ U 

^ -c: 

■ u 



in 

m 



o 

ERIC 



IE3 — Northwest Regional Educational Laboratory — _ - Design Sample Transparency # 1 




Discussion Questions for Activity #5 






n.. 

C 

o 

'•i-f 

to 

D 

to 

> 

(D 

in 

'sz 

M— 

o 

m 

S) 

C 

(U 

m 

(U 

(U 

to 

to 



rv. 

C 

o 

• MM 

H— ' 

to 

3 

to 

> 

a; 

to 

• MM 

x: 



to 

C 

o 

• MM 

H— ' 

to 



I 

a; 



a; 

to 

to 



a; 

"to 

E 

o 

M— 

a; 

r- rv. 
^ to 
_ CL> 
£ bJO 
o fO 
XI 



to 



u 



to 



to 

•i E 

« •— 
to ' — ' 

3 a; 

to > 

S "S 

a; E 



a; 

> 



= E 



3 

O 



to 



5 5 



n.. 

C 

q 

'•i-f 

to 

c 

a; 

E 

CL 



(V 

TO 

O 

E 

M— 

O 

>> 

a; 

TO 



to 

s 

O 

O 

U 

to 

a; 



a; 

to 

to 

C 

o 

E 

a; 

TO 



TO 

a; 

u 

Si 

O 

u 

a; 

u 

c 



fM 



3 

to 


a; 

u 


a; 

!2 


n.. 

3 




d 




o 


TO 


a; 


a; 


bJO 


C 


;o 


•4—1 




3 


■> 


d 


to 




QJ 


.2 




d 




"fj 


TO 


o 


a; 


MM 
• MM 


&— 


H— * 


XI 


it 


3 


ro 


QJ 


3 




d 

#V 1 


(U 


to 

a; 


O 

-4-> 


cy 


r" 




to 


E 




a; 


to 


a; 




-4— ' 


(D 


CL 

E 


i 

• 


jO 


bJD 

O 


• MM 




in 


CL 



to rv. 
tE a; 

> TO 

- s 

TO E 

^ r- 
to C 

3 O 



bJD 



to 



CL> T5 
XI g 

' — ' if2 
Id 
a; 

. W) 

H—' r— 

to .b 

0 to 
CL 3 

1 M_ 

ay o 

o to 

. CL> 

x: w) 

■t- 3 

5 = 

2 I 

O ^ 

EE 

C 3 

.2 to 

H— ' ^ 

3 H-J 

^ biO 

g s 

3 a; 

§ ^ 

> a; 

. ^ 

(X 3 



00 

lO 



r- 

m 



er|c 



Northwest Regional Educational Laboratory „ — Design Sample Transparency #2 





New Curriculum Implemented 






o 

ex 




■yi 

lo 

Hi 



o 

ERIC 



/ 



Northwest Regional Educational Laboratory “ Design Sample Transparency #3 







Handout: School 1 Scenario 



A n elementary school with grades kindergarten through sixth implemented a schoolwide reading 
progreim — School Improvement Model A — this past year as part of the state's comprehensive 
school reform initiative. The schoolwide reading model was selected because the school's expected 
ultimate outcome of children meeting the state reading standards was successfully met in a neigh- 
boring school that had implemented the same reading model. Overall, the principal felt the reading 
scores at his school were dismal; state assessments on writing and math were below the 50th percentile, 
as well, but the principal thought changes to the entire school curriculum would be too overwhelming 
for his school stciff to endorse. 

Support from the schoolwide reading model developers consisted of a week-long training session for 
12 of the 15 teachers two weeks before the beginning of school. The focus of the training was how to 
implement the reading model. Pcirt of the training stressed the importance of completing a checklist 
of implementation indicators every eight weeks so staff could self-assess how well they were imple- 
menting the model's reading components; no other support was provided by the reading model develop- 
ers. The three teachers who did not receive the staff development training received literature on the 
newly implemented model and were briefed by those who attended the training. None of the teachers 
reviewed the grant proposal that was awarded federal funds to implement the school reform model. 
Additionally, the lone support from the local school district came in the form of funds to implement 
the specific schoolwide model. 

The school evaluation plan took a minimalist approach to identifying model success; increase in 
student achievement was the sole impact criterion of the school. Baseline data on children's reading 
scores were at or below the 30th percentile as measured by the California Achievement Test (CAT). 
The goal of the school was to get 90 percent of the underachieving children to make one and a half 
years of progress on the reading section of the CAT. 




Northwest Regional Educational Laboratory 




Design Sample Handout # I 



— ® 





Handout: School 2 Scenario 



S taff at School 2 spent one year reviewing their schools strategic plans, the districtwide needs as- 
sessment, recent standardized tests, and parent surveys to help identify goals for the upcoming 
year in their elementary school. These data helped the school staff decide to implement a schoolwide 
model to help students become proficient in reading. Along with community members, the school staff 
felt that implementation of a more structured reading progrcim would prepare students to meet read- 
ing standards set by the state and school district. 

The school decided it would need to implement a model that would achieve its goals of (1) getting 
all parents and children involved in the school program and, (2) bringing all students within one grade 
level in reading as measured by the state standardized test and with 80 percent of the children passing 
the state benchmark assessment. Based on their desired outcomes. School 2 selected School Improve- 
ment Model B to provide the best opportunity for the growth of their students. The staff also felt that 
the model supplemented its current math and writing curricula. The school also receives financial 
support and technical assistance from its local school district. The support offers teachers a chance 
to receive professional development and to attain the appropriate materials and equipment. 

Although the model chosen by the school supported the nine required components of CSRD, little 
evaluation consideration was given to each of the components. For excimple, no data are to be collected 
on sustained support within the school after the initial implementation of the model. However, the 
staff plan to work with the model developers on data collection sunounding the formative evaluation. 
Model B contains a schoolwide plan for instruction, assessment, classroom management, professional 
development, and parent involvement. The model focuses on shared reading, vocabulary building, and 
writing activities. Teachers have a detailed guide for teaching each component. The staff receive year- 
round professional development from the model developers. In addition to receiving an initial profes- 
sional development at the beginning of the school year by the model developers, school component 
meetings are conducted throughout the year. During the first year of operation the school will receive 
two implementation checks from the model developers, with two implementation checks conducted 
during the second year. The model developers will use their own checklists to ensure proper model 
implementation. Annual curriculum refresher courses are offered to new teachers and anyone else 
on staff who feels the need for additional training. 

The model hats specific benchmarks that align well with the state benchmarks. Therefore, the students 
will be assessed every two months on the model's curriculum-based measure, and those children who 
show the greatest need will get additional help with their reading. The children are also assessed an- 
nually on the school district benchmark, as well as at third and sixth grades on the state benchmark 
assessment. 

Reports are provided to the school staff by the model developers regarding what is going well in the 
school and next steps that need to occur for proper implementation to occur. Data from the state read- 
ing test will provide the school staff with indicators of student achievement gains. 

At the end of the second year the school will hire a school district evaluator to help them compile, 
analyze, cuid interpret the comprehensive implementation data and the district and state benchmark 
assessments. These data will provide the staff with the information to determine changes in student 
achievement. 

Once the data have been analyzed and interpreted, a report will be provided to the school to make 
any programmatic changes necessary to further improve students' academic success and improve parent 
involvement in the school. 



ERIC 




pa&q — Northwest Regional Educational Laboratory 




Design Sample Handout #2 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Handout: School 3 Scenario 



U pon hearing that the state of Oregon would fund 20 Comprehensive School Reform Demonstration 
(CSRD) sites in the coming year, staff at School 3 began to review their school's strategic plans, 
the districtwide needs assessment, recent standardized tests, and parent surveys to identify areas in 
which they could help children perform better in school. These data helped the school staff decide 
that a new schoolwide model could indeed help their students become more proficient in reading, 
an area where the latest district assessments indicated School 3's children were performing miserably. 
Along with community members, the school staff felt that implementation of a more structured reading 
program would prepare students to meet reading standards set by the state and school district. The 
school staff recently implemented a new schoolwide math model and a new literacy model, and the 
staff thought the implementation of a new reading model would provide students with the richest 
of environments in which to learn. After support among school staff was attained for implementing 
a new model, a committee of teachers, the principal, and school district staff wrote a proposal for CSRD 
funding. School staff interested in reviewing the grant were encouraged to offer feedback. Once the 
proposal was funded, all teachers were required to read the proposal. 

The primary goal — as determined by the CSRD Advisory Committee made up of school staff, district 
staff, and parents of children attending School 3 — was for students to become more proficient in read- 
ing. Breaking this goal down even further, the measurable objectives were to increase the number of 
children reading at grade level by 2 percent each year and increase the number of children meeting 
the Oregon state standard for reading by 10 percent each year. The local school district provided a 
third-party evaluator to assist in defining measurable goals and to help the staff identify how these 
goals could be achieved through a schoolwide model. The evaluator assisted in helping the school 
identify a research-based model that included classroom activities, curriculum, resources, and assess- 
ments that would help children perform better in School 3. The model chosen to support children's 
learning was School Improvement Model C. 

Model C contains a schoolwide plan for instruction, assessment, classroom management, professional 
development, and parent involvement. The model focuses on shared reading, vocabulary building, and 
writing activities. Teachers have a detailed guide for teaching each component. The staff receives year- 
round professional development from the model developers. In addition to receiving initial profes- 
sional development at the beginning of the school year by the model developers, school component 
meetings are conducted throughout the year. Annual curriculum refresher courses are offered to new 
teachers and anyone else on staff who feels the need for additional training. 



The advisory committee will oversee both the formative and summative evaluation. The committee will 
meet at least every two months to review the ongoing data collection. During the first year of operation, 
the school will receive three implementation checks from the model developers, with two implementation 
checks conducted during the second year. This advisory committee, with the help of the model developers, 
will create a calendar and checklist to aid in the tracking of appropriate model implementation. Interviews 
and surveys of students, teachers, and parents will be used to collect information on various aspects of 
model implementation. Additionally, classroom observations and focus groups with teachers will provide 
valuable data on how the comprehensive program is being implemented. The advisory committee's goal 
will be to verify the success of the model implementation and make any modifications to classroom in- 
struction, parent involvement, or other program components. 



ERIC 
^ 



School 3's evaluation plan will identify progress toward its goal using both state and local data cissess- 
ments. To measure progress using state assessments. School 3 will use Title I Adequate Yearly Progress 
Criteria as a measure of academic progress. Local student performance measures are important to School 
3 as well. The student performance goal is to improve student achievement in reading with the objec- 
tive of increasing the percentage of students in grades one through six reading at grade level by the 
end of the first year of implementation by 2 percent. Multiple measures will be used to assess these 
changes. For example, local pre- and post-reading 'assessments will be administered as will the CSRD 



Northwest Regional Educational Laboratory- 



163 






Design Sample Handout #3 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 





Handout: School 3 Scenario Continued 



moders 10-week assessment. The final assessment will be a local literacy assessment to be adminis- 
tered at the beginning and end of the school year. To ensure that the program is on the right track, 
School 3 created interim benchmarks. The objective of the interim benchmark is to increase the num- 
ber of students reading at grade level by 0.6 percent each trimester. Students will be assessed with the 
model's 10-week assessment, the local reading assessment, and nightly reading homework records. 
Where possible, the assessments will be conducted in the spring and fall. For example, fall cind spring 
assessments on oral reading samples will be conducted to identify changes in student reading strate- 
gies and understanding of text. 

As is evident. School 3's evaluation plan has two purposes: to document project activities and mon- 
itor progress toward expected outcomes and to summarize the overall progress of the plan's effective- 
ness. School 3 is also concerned that each of the nine CSRD components is addressed in the program 
evaluation. For each of the nine components, specific processes used to review, monitor, and adjust 
the school program are included as part of the evaluation plan. Some of the evaluation tools will be 
administered by the local evaluator, while others will be achninistered by the CSRD's model developer. 
Still others will be administered by the advisory committee staff. The tables below offer part of the 
evaluation of the nine CSRD components. 





Effective Research- Based 
Strategies 

Goal 

■ Implement the CSRD plan 
successfully 

■ Align classroom practice 
to Oregon benchmark 




Indicator/Strategy 

■ Implement strategies 
as intended by model 

■ Analysis of change in 
classroom practice 




Measurement 




■ Monitoring 

■ Teacher reflections on 


iH 


changes in classroom practices 


4-^ 


Who 


0 ) 


■ Advisory committee 


c 

o 


■ Model developer 


w 

Q- 


When 


E 


■ 3 visits per year 


o 


■ Each term 


u 







Comprehensive Design 
Goal 

; ■ Implement, monitor, and 
refine CSRD plan on ongoing 
basis 

Indicator/Strategy 

■ Review progress by checking 
interim student achievement 
data 

Measurement 


(N 


■ Implementation checklist 


4-^ 


■ Review and evaluate 


c 

(D 


disaggregated data 


C 


Who 


o 

Q. 


■ Advisory committee 


E 


When 


o 

u 


■ Each term 




Professional Development 
Goal 

■ Implement a professional 
development plan that results 
in positive change in reading 
and parent involvement 

Indicator/Strategy 

■ Ensure full participation 
in activities 

■ Change in classroom 
practices 

Measurement 

■ Attendance at each activity 

■ Classroom observation 

Who 

■ Advisory committee 

■ Evaluator 

When 

■ Each term 

■ Ongoing 



O 

ERiC 

Northwest Regional Educational Laboratory 






164 



Design Sample Handout #3 



Permission is granted for reproduction by schools for classroom use. Written permission Is required for any other use. 






Handout: School 3 Scenario Continued 



Once the data have been collected, the evaluator will work with the advisory committee on ways 
to analyze the data. Then, working as a group, they will begin to interpret the data. Once the final 
report has been completed, the evaluator and a member of the advisory committee will present the 
findings at a community forum. Although programmatic changes were made throughout the project 
period, the final report will provide additional evidence for possible changes in program practices. 



un 


School Support 
Goal 

■ Implement a professional 
development plan that results 
in positive change in reading 
and parent involvement 

Indicator/Strategy 

■ Advisory committee will 
communicate and solicit 
feedback 

Measurement 

■ Polling of staff by secret 


4-J 


ballot to identify continued 


c 

0) 


support of the model 


c 


Who 


o 

CL 


■ Evaluator 


E 


When 


o 

u 


■ Annually 





Parent and Community 
Involvement 




Goal 

■ Intact and functioning 
family support team 




■ Family participation in 20 
minutes of reading homework 
nightly 




I n di cator/Strate gy 

■ Weekly team meetings, 
develop support plans for 
struggling youth 

■ Homework with parent 
signoff sheet 




Measurement 

■ Model monitoring process 

■ Monitor number of returned 


(6 


assignments 


4-J 


Who 


cu 


■ Advisory committee 


c 


■ Evaluator 


Q. 


When 


E 


■ Annually 


o 


■ Each term 


u 





O 

ERIC, 



S\^ — Northwest Regional Educational Laboratory - 



163 



Design Sample Handout #3 






Permission is granted for reproduction by schools for classroom use. Wntten permission is required for any other use. 







Handout: Discussion Questions for Activity 5 



1. What are the strengths of this evaluation? 



2. What are the limitations of this evaluation? 



3. What would improve the evaluation, at both the formative (implementation) and summative 
(impact) stages? 



4. Will there be evidence for fidelity of model implementation? 



5. Is there sufficient evidence collected to demonstrate the school's progress toward its goal? 



6. What evaluation model (such as growth, pretest-posttest) is being utilized? What are the 
strengths and disadvantages of using this evaluation model? 



166 



o 

ERIC 

""‘‘"' iiSp — Northwest Regional Educational Laboratory 






Design Sample Handout #4 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 




Handout; Schoolwide Evaluation Worksheet 



Basic Evaluation Framework: The following is a brief description of the elements needed for a sound 
school evaluation design. An evaluation design should express student performance goals. Ideally, the 
goals highlighted in the evaluation design should encompass, but not be limited to, all existing goals 
identified by your school in your schoolwide plan. Each identified student performance goal has a spe- 
cific objective, strategy for attainment, indicators and benchmarks, and measurement method. 



CO 

c 


Student performance goals — What do we want students to 
ultimately achieve? 

A general description of student goals. 

Objectives—tV/iflt do students need to specifically achieve to 
accomplish goab? 

A specific, measurable description of student performance 
that identifies a time frame for achieving goals. 

Strategies for attainment — What do schoob have to do to help 


students accomplish goab and objectives? 


o 


A description of the strategies, means, and methods used by 


c 


schools to accomplish student performance goals. 


Q) 

Q 


Local indicators and benchmarks — What evidence do we need 


to demonstrate progress toward goab? 


15 


A specific description of the state, local, and interim indicators 


o 


and benchmarks to be used to measure progress toward student 




performance goals and objectives. 


"03 


Measurement methods — How will we gather the evidence needed 


(D 


to demonstrate successful achievement of goals? 


-J 


A specific description of the instruments or methods to be used 


c 


to gather evidence of progress toward attainment of student 


Q) 

“O 


performance goals and objectives. 


3 


Source: Guidelines jbr preparing a charter school accountability plan, Massachusetts 


LO 


Department of Education 



ERIC 



— Northvsrest Regional Educational Laboratory 



167 



Design Sample Handout 



#5 ( 0 ) 



Permission is granted for reproduction by schools for classroom use. Written permission is required for any other use. 






Resources 




B elow is a listing of useful print and online information resources that relate to evaluating school- 
wide reform programs, and a listing of technical assistance providers. Most of the print resources 
may be borrowed from the Comprehensive Center's Resource Center. Please contact the Comprehensive 
Center for more information. 

Print 



Data Use Tools 

Bernhardt, V.L. (1998). Data analysis for comprehensive schoolwide improvement Larchmont, NY: Eye 
on Education. 

Targeted at non-statisticians, this practical toolbook shows educators how to gather, analyze, and use 
data to improve all aspects of schools. 

Holcomb, E.L. (1999). Getting excited about data: How to combine people, passion, and proof Newbury 
Park, CA: Corwin Press. 

This practical manual answers questions about what data to collect, how to analyze data, and how to 
use the data to align school improvement. 

Levesque, K., Bradby, D., Rossi, K., & Teitelbaum, P. (1998). At your fingertips: Using everyday data to 
improve schools. Berkeley, CA: MPR Associates, Berkeley, CA: National Center for Research in Voca- 
tional Education, & Arlington, VA: American Association of School Administrators. 

This workbook is designed to help educators use a variety of data to better manage, monitor, and im- 
prove schools. The workbook is structured to help teams and individuals develop performance indica- 
tor systems that can be used to identify strengths and weaknesses and to develop strategies to meet 
educational goals. 

Roza, M. (1998). A toolkit for using data to improve schools: Raise student achievement by incorporating 
data analysis in school planning. Newton, MA: Education Development Center, New England Compre- 
hensive Assistance Center. 

The Toolkit is intended for use by school and district staff interested in using data to improve school 
programs. This resource will enable users to collect, understand, and use data for creating and improv- 
ing schoolwide plans designed to increase student achievement. The Toolkit comes with a companion 
resource, the Data Templates, designed to help collect, disaggregate, and display baseline data. 

Wagner, M., Fiester, L., Reisher, E., Murphy, D., & Golan, S. (1997). Making information work for you: 
A guide for collecting good information and using it to improve comprehensive strategies for children, 
families, and communities. Washington, DC: U.S. Department of Education. 

This evaluator's toolkit provides evaluation methods and instruments that schools can use to collect 
sound information and document program progress. Suggestions are included for starting the evalua- 
tion process and documenting results. 



ERIC 



168 




Resources 



Evaluation Tools 



Beyer, B.K. (1995). How to conduct a formative evaluation. Alexandria, VA: Association for Supervision 
and Curriculum Development. 

This book describes how to conduct an evaluation of educational programs by assessing the program 
during various stages of its development. The author provides practical checklists, data-collection in- 
struments, and other resources to assist in conducting the evaluation. 

Billig, S.H., & Kraft, N.P. (1996). Linking Title I and service-learning: A planning, implementation, and 
evaluation guide. Denver, CO: RMC Research Corporation. 

This guide provides guidelines for program planning, operations, and evaluations for Title I programs. 
Section IV discusses how to evaluate the impact of a program and how to improve its effectiveness. 

CicchinelU, L.F., & Barley, Z. (1999). Evaluating for success. Comprehensive school reform: An evaluation 
guide for districts and schools. Aurora, CO: Mid-continent Research for Education and Learning. 

This guide provides practical information, tips, and tools to help schools and districts meet the evalu- 
ation requirements of the federally sponsored Comprehensive School Reform Demonstration (CSRD) 
program. The guide is also useful for schools and districts involved in other comprehensive school 
reform efforts and especially useful for those who don't have extensive evaluation experience. 

Herman, J.L., & Winters, L. (1992). Tracking your school's success: A guide to sensible evaluation. 
Newbury Park, CA: Corwin Press. 

This comprehensive guide offers educators step-by-step procedures and practical guidance needed 
to conduct sensible assessments and evaluations, and record and measure progress. It also instructs 
the reader on how to use evaluation information to aid in school planning and improve management 
decisions. 

King, J.A, Morris, L.L., & Fitz-Gibbon, C.T. (1987). How to assess program implementation. Newbury 
Park, CA: Sage. 

This is part of the Sage series called The Program Evaluation Kit (2nd ed.). The series contains nine 
books written to guide and assist practitioners in planning and managing evaluations: (1) Evaluators 
handbook; (2) How to focus an evaluation; (3) How to design a program evaluation; (4) How to use 
qualitative methods in evaluation; (5) How to assess program implementation; (6) How to measure atti- 
tudes; (7) How to measure performance and use tests; (8) How to analyze data; and (9) How to com- 
municate evaluation findings. 

Pechman, E., Allen, S., Funkhouser, J., Kelliher, K., Rouk, U., 8i Rusnak, K. (1998). Implementing 
schoolwide programs: Volume 1, an idea book on planning. Washington, DC: U.S. Department of 
Education. 

This book focuses on the issues of schoolwide program planning and combining resources. It contains 
many examples from various schools that illustrate the issues discussed. Thorough assessment of needs 
and schoolwide plarming are essential for comprehensively upgrading the effectiveness of a school. Two 
appendices provide tools for planning schoolwide programs and extensive information about print, video, 
and Internet resources available to planners. 

RMC Research Corporation. (1995). Schoolwide programs: A planning manual. Portland, OR: Author. 



Designed to help educators collect data on their school, and plan and implement a schoolwide pro- 
gram. This manual discusses the vision behind and advantages of a schoolwide program. It highlights 
a four-step process for planning a schoolwide program: (1) conducting a comprehensive needs assess- 
ment; (2) managing the inquiry process; (3) designing the schoolwide program; and (4) evaluating 
the program. 

Sanders, J.R. (1992). Evaluating school programs: An educator's guide. Newbury Park, CA: Corwin Press. 

Here is a general guide to help in planning and conducting school program evaluations. The author 
guides the reader through each step in the evaluation process: how to focus the evaluation, and how 
to collect, organize, analyze, report, and use the information collected. 

Examples of State CSR Evaluation Plans 

Oregon Department of Education. (1999). Oregon Comprehensive School Reform Demonstration Program 
1999 state evaluation plan: Guidance and timeline. Salem, OR. Author 

The plan has two purposes: to document project activities and progress toward expected outcomes, 
and to summarize the overall progress of the reform program. 

Washington State Office of Superintendent of Public Instruction. (1999). Comprehensive School Re- 
form Demonstration Program local evaluation report. Olympia, WA. Author. 

The purpose of this evaluation report is to "monitor and document CSRD program implementation; to 
assess progress toward expected outcomes; and to determine overall program effectiveness in improv- 
ing student achievement." 

For information about other state CSR evaluation plans, contact the state departments of education. 

Research Articles and Studies 

Glennan, T.K., Jr. (1998). New American Schools after six years. Santa Monica, CA: RAND. 

In July of 1991, New American Schools (NAS) was established to develop designs for what were termed 
"break the mold" schools. Its initial goal was to create designs to help schools enable students to reach 
high educational standards. It then moved to implement the new design in a significant number of 
schools as an element of a strategy for promoting wider education reform. This report describes 
RAND's perspectives on the evolution of NAS' mission. 

Kushman, J.W., & Yap, K.O. (1999). What makes the difference in school improvement? An impact 
study of Onward to Excellence in Mississippi schools. Journal of Education for Students Placed at 
Risk, 4(3), 277-298. 

The study examined the implementation of OTE and its impact on student achievement over a five- 
year period. The study concludes that implementation and retention were uneven across schools and 
that high-fidelity implementation appears to lead to positive results. The authors discuss the diffi- 
culties in implementing whole-school reform models and the factors that help or hinder success. 

Stringfield, S., Datnow, A., Ross, S.M., & Snively, F. (1998). Scaling up school restructuring in multi- 
cultural, multilingual contexts: Early observations from Sunland County. Education and Urban Society, 
30(3), 326-357. 



This study addresses three policy questions: (1) How effective are current school restructuring pro- 
grams in improving the achievement of students in schools with large numbers of language-minority 
students? (2) Are some models better suited to multilingual environments than others? (3) What ac- 
tions at the federal, state, district, and school level increase or decrease the probability of these schools 
obtaining full benefits from these models? 

Taylor, D.L., & Teddlie, C. (1999). Implementation fidelity in Title I schoolwide programs. Journal of 
Education for Students Placed At Risk, 4(3), 299-319. 

This study examines the extent to which schools that received Title I funds for schoolwide programs 
implemented the plans they developed. Findings showed that while schools implemented some of the 
plan components, such as hiring Title I teachers and teaching assistants, instructional innovations 
included in the plans were not implemented. The article concludes with specific recommendations for 
districts and schools. 

Wong, K.K., & Meyer, S.J. (1998). Title I schoolwide programs: A synthesis of findings from recent 
evaluation. Educational Evaluation and Policy Analysis, 20(2), 115-136. 

This article synthesizes what is known about Title I schoolwide programs, focusing on programmatic 
and organizational characteristics of schoolwide program schools and districts, and evidence of the 
effectiveness of schoolwide program schools, especially in terms of student performance. 

Online Publications and Resources . 

Herman, R., Aladjem, D., McMahon, R, Masem, E., Mulligan, I., O'Malley, A.S., Quinones, S., Reeve, 
A., & Woodruff, D. (1999). An educator's guide to schoolwide reform. Arlington, VA: Educational 
Research Service. Retrieved June 14, 2000 from the World Wide Web: 
www.aasa.org/Reform/index.htm 

The American Institutes for Research (AIR) developed this guide for educators and others to use 
when investigating different approaches to school reform. It reviews the research on 24 "whole- 
school," "comprehensive," or "schoolwide" approaches. 

Comprehensive School Reform Demonstration [Web site] Northwest Regional Educational Laboratory, 
Portland, OR www.nwrel.org/csrdp/index.html 

This Web site offers descriptions of school reform models, contact information for service providers, a 
listing of Northwest school CSR sites, descriptions of the types of assistance available, and Internet 
Unks to articles about reform models. 

Comprehensive School Reform Demonstration Program [Web site] U.S. Department of Education, 
Washington, DC www.ed.gov/offices/OESE/compreform/ 

This Web site includes a publications Ust, tools, state contacts, and other Web-site links related to CSRD. 

Klein, S., Medrich, E., & Perez-Ferreiro, V. (1996). Fitting the pieces: Education reform that works. 
Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement. 
Retrieved June 14, 2000 from the World Wide Web: www.ed.gov/pubs/SER/FTP 



ERIC 



UflWliffflffTIILiU 






171 



An indepth study of 12 education reform studies commissioned by the U.S. Department of Education. 
Each study comprises three volumes. Volume I contains a discussion of the study, case study summaries 
of the schools or school districts examined, and recommendations. Volume II contains detailed case 
studies. Volume III is a technical appendix explaining the study's methodology. 

NWREL's Assessment & Evaluation Services [Web site]. Northwest Regional Educational Laboratory, 
Portland, OR www.nwrel.org/eval/index.html 

The Assessment and Evaluation Program translates for educators and community leaders the best re- 
search into practical, user-friendly resources and services for the assessment of educational results. 
The Web site contains a searchable database of assessment resources available for loan through the 
Assessment Resource Library. 

Quellmalz, E., Shields, P.M., Knapp, M.S., Hamburg, J.D., Anderson, L., Hawkins, E., Hill, L., Ruskus, 
J., & Wilson, C.L. (1995). School-based reform: Lessons from a national study. A guide for school 
reform teams. Washington, DC: U.S. Department of Education. Retrieved June 14, 2000 from the 
World Wide Web: http://ed.gov/pubs/Reform/index.html 

This national study, conducted by SRI International for the Planning and Evaluation Service of the 
U.S. Department of Education, examined effective school programs and other school-based reform ef- 
forts nationwide. This guide provides advice and specific examples based on the findings of the study. 

Videotape 

Ross, S., & Davis, D. (Presenters). (1999). Selecting and implementing comprehensive school reform 
programs [Videotape]. Portland, OR: Northwest Regional Educational Laboratory, Comprehensive 
School Reform Demonstration Program. 

This videotape provides detailed information on keys to selecting, implementing, and evaluating Com- 
prehensive School Reform Programs. Dr. Stephen Ross of the University of Memphis discusses the for- 
mative evaluation process for school reform programs and presents examples of evaluation instruments. 

Technical Assistance Providers 



U.S. Department of Education Regional Offices 

The U.S. Department of Education maintains 10 regional offices throughout the country. The follow- 
ing offices have representatives in each regional office: 

The Secretary's Regional Representative (SRR) and staff conduct departmental business on many issues. 
The Office of Postsecondary Education (OPE) handles questions related to student financial assistance 
programs. The Office of Special Education and Rehabilitative Services (OSERS) assists constituents with 
rehabilitative services. The Office for Civil Rights (OCR) responds to questions about, and reviews com- 
plaints related to, dvil rights issues. The Office of the Inspector General (OIG) investigates potential viola- 
tions of law and conducts audits on Department-funded programs. The Office of Management (OM) has 
personnel offices or representatives in each of the regional offices. 

Additional information regarding the Regional Offices can be found at: www.ed.gov/pubs/TeachersGuide/ 
offices.html or by contacting the U.S. Department of Education. 



ERIC 




Resources 



Comprehensive Regional Assistance Centers (CCs) 



The 15 Comprehensive Centers provide comprehensive training, technical assistance, and capacity build- 
ing to local education agencies, schools, tribes, states, and community-based organizations. Services 
are designed to help schools and districts focus on improving teaching and learning, especially in the 
development of schoolwide programs and programs that improve the opportunity for all children to 
meet challenging state content and student performance standards. These services include meeting 
the special needs of children served under the Improving America's Schools Act (lASA), including 
children in high-poverty schools, migrant children, immigrant children. Native American children, 
children with limited English proficiency, neglected or delinquent children, homeless children, and 
children with disabilities. 

Additional information regarding the Comprehensive Assistance Centers can be foimd at: www.wested.org/ 
cc/html/ccnetwork.htm or by contacting the U.S. Department of Education. 

Regional Educational Laboratories 

The Regional Educational Laboratory Program is the U.S. Department of Education's largest research and 
development investment, designed to help educators, policymakers, and commimities improve schools 
and help all students attain their full potential. The network of 10 Laboratories works to ensure that 
those involved in educational improvement at the local, state, and regional levels have access to the 
best available resecurch and knowledge from practice. A main priority that guides all Laboratory work 
is helping educators and administrators expand systemic reform to benefit schools, and the educa- 
tional programs within them, in all communities. 

Additional information regarding the Regional Educational Laboratories can be found at: www.relnet- 
work.org or by contacting the U.S. Department of Education. 

Elsenhower Regional Math/Sdence Consortia 

The 10 consortia provide technical assistance and disseminate information to teachers and other ed- 
ucators in implementing mathematics and science programs in accordance with state standards. 

For information on service providers in your region, please contact your state department of education 
or the U.S. Department of Education. 

Additional information regarding the consortia can be found at www.enc.org or by contacting the 
U.S. Department of Education. 



O 



ERIC 

uflwiiiniffTiTiiy M 



173 



EA 030Go^ 




U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 
Educational Resources Information Center (ERIC) 




NOTICE 






REPRODUCTION BASIS 




This document is covered by a signed “Reproduction Release 
(Blanket) form (on file within the ERIC system), encompassing all 
or classes of documents from its source organization and, therefore, 
does not require a “Specific Document” Release form. 




This document is Federally-funded, or carries its own permission to 
reproduce, or is otherwise in the public domain and, therefore, may 
be reproduced by ERIC without a signed Reproduction Release form 
(either “Specific Document” or “Blanket”). 




EFF-089 (9/97) 




